Evaluating Pre-Trained Foundation Models for Glioblastoma Overall Survival Prediction
Abstract
Purpose
Accurate prediction of overall survival (OS) in glioblastoma (GBM) remains challenging in clinical practice. Recently developed foundation models trained on large-scale medical imaging datasets offer a promising strategy to improve downstream clinical prediction tasks. However, systematic evaluation of these models for survival prediction remains limited. This study evaluates multiple medical image analysis foundation models for GBM OS prediction and investigates the benefit of model ensembling.
Methods
We utilized the publicly available UCSF-PDGM dataset from The Cancer Imaging Archive, which includes multi-parametric brain MRI and corresponding clinical data for 499 patients with GBM. Four state-of-the-art foundation models were evaluated: BrainSegFounder, VoCo, BrainIAC, and SAM-Med3D. These models were pre-trained on large-scale neuroimaging or 3D medical imaging datasets ranging from approximately 22,000 to 160,000 volumes. Imaging features extracted from each foundation model were incorporated into a deep survival prediction framework for OS estimation. For comparison, a self-trained ResNet model trained from scratch on the UCSF-PDGM dataset was also evaluated. Model performance was assessed using the concordance index (C-index) and compared against a baseline clinical-feature-only model without pre-training. An ensemble model combining predictions from all foundation models was additionally evaluated.
Results
The self-trained ResNet model achieved a C-index of 0.657, underperforming the clinical-only baseline (C-index = 0.678). In contrast, all foundation model–based approaches outperformed both the self-trained ResNet and the clinical-only model. Individual foundation models achieved C-indices of 0.693 (BrainIAC), 0.696 (BrainSegFounder), 0.697 (SAM-Med3D), and 0.716 (VoCo). The ensemble model demonstrated the best performance, achieving a C-index of 0.736.
Conclusion
Foundation models outperform both self-trained deep learning models and clinical features alone for GBM overall survival prediction. Ensemble learning further enhances predictive performance, highlighting the complementary prognostic information captured by diverse pre-training strategies. These findings demonstrate the value of foundation models for data-limited survival prediction tasks in oncology.