Overcoming Computational Bottlenecks In Large-Scale Medical Image Segmentation Using Optimized U-Net
Abstract
Purpose
Training nnU-Net models for medical image segmentation with large patient samples is computationally expensive, limiting iteration speed in research and clinical translation. We present an optimized training workflow that significantly accelerates nnU-Net training on massive datasets without compromising segmentation performance.
Methods
A cohort of 1,539 CT scans was utilized to train nnU-Net (v2) for multi-organ segmentation of 117 structures. The baseline followed the standard nnU-Net training pipeline with default preprocessing, sampling, and training schedule. We implemented an accelerated workflow incorporating four key engineering enhancements: (1) automatic mixed precision training, (2) dynamic batch size scaling to maximize GPU memory utilization, and (3) an efficiency-driven training schedule with early convergence monitoring while maintaining identical input resolution and network configuration. Both the baseline and proposed workflows maintained identical input resolutions and network architectures, utilizing a single NVIDIA L40S GPU. Performance was benchmarked using total GPU training time, Dice Similarity Coefficient (DSC), and 95th-percentile Hausdorff Distance (HD95) on a held-out test set with 89 CT scans.
Results
Baseline training required 97.4 hours total GPU time. The proposed workflow reduced training time to 35.7 hours, achieving a 2.7× speedup (63.3% time reduction). Segmentation accuracy was preserved across all evaluated structures. For instance, mean DSC (baseline vs. efficient) was 0.794 vs. 0.797 (p=0.136) for prostate, 0.925 vs. 0.923 (p=0.133) for brain, and 0.951 vs. 0.955 (p=0.013) for middle lung lobe. Mean HD95 was 4.857 vs. 5.517 (p=0.136) for prostate, 2.781 vs. 3.146 (p=0.180) for brain, and 4.019 vs. 3.658 (p=0.421) for middle lung lobe.
Conclusion
We demonstrated that targeted optimization of the nnU-Net framework significantly reduces training overhead without compromising segmentation fidelity in a large-scale clinical dataset. This streamlined workflow lowers the computational threshold for deep learning in radiation oncology, facilitating rapid model iteration and more efficient deployment of automated segmentation tools in clinical environments.