Built for Children: A Pediatric-Trained Hybrid Autosegmentation Pipeline for Craniospinal Irradiation
Abstract
Purpose
Pediatric craniospinal irradiation (CSI) autosegmentation is challenging due to small, age-variable anatomy. Performance of a pediatric-trained in-house pipeline was compared with atlas-based and commercial AI methods for pediatric CSI organs at risk (OARs) using overlap and distance metrics.
Methods
Eighteen OARs from 211 pediatric CSI CT scans with clinician-approved contours (138 training, 73 test; median age 9 years; range 2–25). A pediatric-trained hybrid nnU-Net/nnFormer pipeline generating unified multiorgan masks was evaluated against atlas-based and commercial deep learning methods. OARs spanned cranial, thoracic, and abdominal regions (e.g., brain, brainstem, optic chiasm, spinal canal, esophagus, kidneys, lungs). Methods were evaluated using 13 overlap and distance metrics, including Dice similarity coefficient (DSC), Jaccard index, true positive rate, true negative rate, positive predictive value, Rand index, Hausdorff distance summary statistics (Min, Mean, Median, Max, 95th percentile, standard deviation), and mean distance to agreement. Linear mixed-effects models with patient and organ random intercepts estimated adjusted method differences and tested age-by-method interactions.
Results
Across 234 organ–metric comparisons, the in-house method was superior in 42% and not significantly different in 54% (96% at least comparable); commercial AI was superior in 4%. Mean DSC improvements ranged from 0.06–0.12 versus atlas-based segmentation and 0.03–0.08 versus commercial AI, with the largest gains for small or anatomically variable OARs (optic chiasm, esophagus, spinal canal). Distance metrics showed reduced error and lower inter-patient variability, with median distance reductions of approximately 1–3 mm relative to atlas-based contours. A significant age-by-method interaction was observed for DSC (p<0.01): atlas-based performance declined with age (−0.005 DSC/year), while the in-house method was age-invariant.
Conclusion
A pediatric-trained hybrid pipeline achieved broadly superior overlap and distance-based performance and improved robustness across pediatric ages compared with atlas-based and commercial approaches, supporting pediatric-specific CSI autosegmentation for clinical use.