Cross-Modality Transfer of a CT–Fine-Tuned Foundation Model for Prostate and Lesion Segmentation on MRI
Abstract
Purpose
Prostate MRI is increasingly used in modern radiotherapy, but compared with CT, large-scale MRI datasets remain limited for fine-tuning foundation models. This study investigates the cross-modality transferability of a CT–fine-tuned foundation model to prostate MRI segmentation and evaluates whether CT domain alignment improves downstream performance for prostate and dominant intraprostatic lesion (DIL) segmentation.
Methods
We propose to adapt a CT–fine-tuned DINOv3 foundation model for MRI segmentation via domain-specific fine-tuning (MR-DINOv3). MR-DINOv3 was compared with two nnU-Net–based baselines that differ only in encoder initialization: (1) nnU-Netv2 and (2) base DINOv3. Two public datasets were used: TCIA (n=1,017 T2-weighted MRI) for prostate segmentation and PI-CAI (n=1,500 T2-weighted MRI), including 425 positive cases (PI-RADS > 2) with DIL annotations. To isolate the effect of initialization/transfer, models were trained with a frozen encoder during supervised learning. Performance was evaluated using mean and median Dice similarity coefficient (DSC). DIL performance was additionally stratified by lesion volume (small 2 cm³) and centroid distance was also assessed.
Results
For prostate segmentation on TCIA, MR-DINOv3 performed comparably to base DINOv3 (mean/median DSC 0.8707/0.904 vs 0.8713/0.906) and slightly higher than nnU-Netv2 (0.8645/0.902). For DIL segmentation on PI-CAI, nnU-Netv2 achieved the highest overall DSC; however, MR-DINOv3 achieved the best performance for large lesions (>2 cm³) and performed comparably for medium and small lesions. MR-DINOv3 also yielded the lowest overall centroid distance and maintained stable precision across lesion-volume groups.
Conclusion
A CT-fine-tuned foundation model demonstrates effective transfer to MRI for prostate and lesion segmentation, achieving performance comparable to MRI-specific baselines (e.g., nnU-Netv2 trained on MRI). These results support cross-modality foundation model transfer as a practical and data-efficient strategy for segmentation in data-limited modalities such as MRI, and potentially PET, by leveraging the abundant CT data routinely available in radiotherapy workflows.