Teamwork Makes the AI Work: Multi-Institutions Lung SBRT GTV Autosegmentation Using Federated Learning
Abstract
Purpose
Developing robust AI models for radiotherapy requires large, diverse datasets, which are difficult to share between institutions due to patient confidentiality, IT restrictions, and limited local machine learning expertise. Federated learning offers a solution by enabling collaborative model training without exchanging raw data. This work aimed to demonstrate the feasibility of federated training between two institutions by developing a lung GTV segmentation model for SBRT planning.
Methods
We adapted the nnU-Net framework to operate in federation using a OneDrive repository for weights exchange, as standard federated learning platforms (e.g., Vantage6, Flower, CODA) are challenging to deploy in restrictive healthcare environments. Each institution trained locally on its own dataset (210 and 207 thoracic single‑phase CTs from institutions A and B 4D-CTs respectively), using 5-fold cross-validation with an 80:20 training-to-test random split. After each epoch, model weights were exchanged and averaged through OneDrive. Performance was evaluated using DICE similarity coefficients on each institution’s validation and independent test sets and compared against locally trained models and publicly available lung segmentation models (TotalSegmentator and MedSam2). Statistical comparison used repeated measure ANOVA with post-hoc pairwise t-tests with Holm-Bonferroni correction.
Results
On the cross-validation dataset, the federated model achieved a mean DICE score of 0.72±0.19, significantly outperforming single-institution training (0.68±0.24, p<0.01) and benchmarks from TotalSegmentator (0.57±0.24, p<0.01) and MedSam2 (0.33±0.20, p<0.01). No significant difference was observed between single-institution training (0.74±0.16) and federated training (0.75±0.16) on the independent test set using 5-fold ensemble inference, likely due to smaller sample size and information loss when extrapolating from the model’s 3‑mm inference slice thickness to the 2‑mm test set slice thickness.
Conclusion
Federated learning enables institutions to collaboratively build higher-performing AI models without compromising patient privacy. Beyond improving segmentation accuracy, this approach fosters knowledge sharing and skill development, paving the way for scalable, community-driven AI solutions in radiotherapy.