A Federated Learning Scheme Based on Deep Ensemble Learning for MRI-Based Multi-Institutional Study of Brain Metastasis Segmentation
Abstract
Purpose
To propose a federated learning (FL) framework incorporating a novel deep ensemble strategy for multi-institutional brain metastasis (BM) segmentation, improving performance in limited local datasets while preserving privacy by avoiding large-scale data transfer for retraining.
Methods
Two BM cohorts (n=90/n=459) with post-contrast T1w MRI and expert BM contours were collected from two institutions and split into training/validation/test sets (7:1:2), respectively. A 3D spherical image transformation based on inhomogeneous lattice scaling in spherical coordinates was designed and applied to each MR volume using 27 equally spaced spherical centers across the field-of-view(FOV), generating locoregional views with varying anatomical detail. A customized UNet++ served as the segmentation backbone and was trained on the transformed images, with ground-truth contours undergoing the same transforms. For each case, 27 segmentations were generated, inversely mapped back to the original Cartesian grid, and fused via a learned ensemble inference strategy to produce the final segmentation. Four models were compared: (1)a baseline model trained on Institution-S(n=90) original data(MB); (2)a local ensemble model at Institution-S(MS); (3)a local ensemble model at Institution-L (ML, n=459); and (4) the proposed FL using MS and ML as clients(MFL). Segmentation performance was evaluated using Dice coefficient, precision, recall(sensitivity), and F1-score.
Results
On the Institution-S test set, MB achieved limited performance (Dice=0.193, precision=0.134, recall=0.506, F1=0.184). MS noticeably improved performance (Dice=0.583, precision=0.807, recall=0.727, F1=0.714), indicating the added value of ensemble learning. MFL further improved performance (Dice=0.641, precision=0.770, recall=0.806, F1=0.724) with reduced false-positive detections relative to MS. On the Institution-L test set, MFL achieved comparable performance (Dice=0.778, precision=0.842, recall=0.835, F1=0.813) to ML trained on the large dataset (Dice=0.768, precision=0.811, recall=0.870, F1=0.820).
Conclusion
The proposed ensemble-enabled FL framework leverages models trained on large datasets to regularize and improve BM segmentation in limited-data settings, providing a scalable, privacy-preserving strategy for multi-institutional collaboration, particularly for sites with limited data availability.