Uncertainty-Guided Federated Learning for Robust Multi-Institutional MRI-Based Brain Metastasis Segmentation
Abstract
Purpose
To develop and evaluate a federated learning (FL) framework for brain metastasis (BM) segmentation that integrates an uncertainty score into a novel FL objective, improving segmentation robustness and potentially performance when training on limited-size datasets in heterogeneous multi-institutional settings.
Methods
Two BM datasets (n=90/n=459) with post-contrast T1w-MRI and expert BM contours were collected from two institutions and split into training/validation/test sets (7:1:2). At each institution, a customized UNet++ was trained for BM segmentation using test-time augmentation (TTA) to generate multiple predictions per case. These predictions were ensembled to produce a final segmentation and a pixel-wise variation map that quantifies prediction dispersion (uncertainty). Using federated averaging, the two client UNet++ models were aggregated into a global model. The FL objective was modified by adding an uncertainty-guided penalty term that suppresses high variation-map intensities within the ground-truth BM regions, thereby encouraging stable and confident segmentations. Three models were compared: two local institutional models (MS,n=90; ML, n=459) and the global FL model MFL. Performance was assessed using Dice coefficient, precision, recall(i.e., sensitivity) and F1-score. An uncertainty score ([0,1]) was additionally computed to quantify segmentation robustness.
Results
ML, trained on the larger dataset, achieved strong performance on its test set (Dice=0.763, precision=0.737, recall=0.901, F1=0.781). In contrast, MS showed limited performance (Dice=0.123, precision=0.055, recall=0.864, and F1=0.094) with increased false positives attributable to the smaller training set. MFL substantially improved performance on the test data from Institution-1’s limited-size cohort (F1=0.612) with reduced false-positive detections. As expected, MFL showed reduced performance on the Institution-2 test set relative to ML. Notably, uncertainty scores for MFL were 0.683 (Institution-1 test) and 0.810 (Institution-2 test), compared with MS=0.655 and ML=0.705.
Conclusion
An uncertainty-guided FL framework was developed to enhance robustness and improve BM segmentation performance in limited-data settings. This scalable, privacy-preserving approach enables multi-institutional collaboration with integrated uncertainty benchmarking.