A Unified Foundation Model for Medical Image Analysis In Radiotherapy Via Multi-Teacher Knowledge Distillation
Abstract
Purpose
Foundation models (FMs) have demonstrated strong performance on challenging radiation therapy tasks such as automatic delineation, deformable image registration, and multimodal visual question answering (VQA). However, they are typically task-specific and require specialized pipelines, limiting cross-task generalization and their utility for end-to-end clinical decision-making. To address this limitation, we introduce a multi-teacher knowledge distillation strategy to learn a unified and efficient backbone that generalizes across tasks without task-specific fine-tuning.
Methods
We propose PRISM-RT, a multi-teacher distillation framework that learns a unified transformer backbone by distilling complementary representations from three FM teachers—MedSAM, DINOv3, and BiomedCLIP—using multiresolution feature translators and unsupervised feature-level alignment on ~1.5M CT images. PRISM-RT was quantitatively evaluated on downstream tasks on public datasets including delineation, registration, and VQA using quantitative metrics such as Accuracy, Dice similarity coefficient (DSC), intersection over union (IoU), and 95th percentile Hausdorff distance (HD95).
Results
For gross tumor volume (GTV) delineation, evaluated on a public dataset of 140 head-and-neck cancer patients who underwent radiation therapy, PRISM-RT achieved DSC 0.778 ± 0.108 vs 0.738 ± 0.041, IoU 0.652 ± 0.130 vs 0.596 ± 0.048, and HD95 10.19 ± 3.92 mm vs 17.00 ± 3.32 mm compared with MedSAM (p<0.01). For deformable registration on public dataset of 50 cardiac MR–MR pairs, PRISM-RT improved post-warp overlap (Dice 0.770 ± 0.104 vs 0.668 ± 0.137) and reduced HD95 to 4.98 ± 2.52 mm vs 6.37 ± 2.61 mm compared with DINOv3+T3 (p<0.01). For VQA evaluation on public dataset of 2,500 CT image–question pairs, PRISM-RT achieved 87.5% accuracy vs 86.3% compared with BiomedCLIP (p<0.01).
Conclusion
PRISM-RT unifies complementary FM knowledge into a single backbone that generalizes across delineation, registration, and VQA. By eliminating the need for task-specific foundation models and fine-tuning, PRISM-RT enables efficient, scalable, and reliable deployment of AI across radiotherapy workflows.