Analysis of Whole-Body CT Segmentation Models for Pediatric Organ Dosimetry In Clinical Workflows
Abstract
Purpose
This study analyzes reliability of whole-body CT segmentations from a deep-learning model deployed at a large pediatric center across protocol types and ICRP organ categories, evaluating age-stratified performance for clinical dosimetry workflows.
Methods
The MONAI CT Whole Body Segmentation model was deployed using MONAI Deploy platform with automated routing of all clinical CT exams; segmentations for 104 organs/tissues were saved with volumes in DICOM format. Organs were grouped by ICRP 103 tissue weighting: brain (w_T=0.01), esophagus/liver/bladder (0.04), lungs/colon/stomach (0.12), red marrow surrogates (vertebrae/ribs/pelvis, 0.12), and remainder tissues (kidneys/spleen/heart/pancreas/adrenals/bowel, 0.12). Analysis was stratified by age (G1:0–2y, G2:2–5y, G3:5–9y, G4:9–14y, G5:14–18y, G6:≥18y) and region (head/neck/chest/abdomen-pelvis/multi-region). For each age group independently, component organ volumes were summed to create total OAR group volumes per exam; coefficient of variation (CoV) and Tukey-method outlier fraction were then computed on these summed volumes, isolating segmentation variance from physiological growth. CoV and outlier fraction were averaged across age groups to assess age-independent reliability. Zero-volume organs were excluded as absent from field-of-view.
Results
Over 4 months, 1,803 exams (age 0–46y; 64% head, 21% abdomen-pelvis, 15% other) were analyzed. Brain showed highest reliability (CoV=0.69±0.07, 0% outliers); red marrow group showed highest variability (CoV=1.45±0.28, 5.0%±7.2% outliers. CoV decreased 2–3 times from youngest to oldest cohorts for bladder (1.44→0.69) and red marrow (1.82→1.09). Abdomen-pelvis protocols yielded most reliable remainder tissue segmentations (CoV=0.47±0.11, 2.5%±3.1% outliers).
Conclusion
Age-stratified analysis demonstrates acceptable reliability for ICRP-weighted OARs, with highest confidence for brain, liver, and abdominal organs. Youngest cohorts and small organs show reduced consistency, requiring further refinement for accurate patient-specific dosimetry.