Poster Poster Program Therapy Physics

BLUE RIBBON POSTER MULTI-DISCIPLINARY: Improving Survival Prediction of Head-and-Neck Cancer with Medical Image, Foundation Models and Multi-Modal Fusion

Abstract

Purpose

Accurate survival prediction in head and neck squamous cell carcinoma (HNSCC) is clinically important for risk stratification but remains challenging due to small cohort sizes and the difficulty of integrating high-dimensional PET/CT imaging with low-dimensional clinical covariates. We investigate whether pretrained, modality-specific medical image foundation models (FMs) can improve multimodal survival prediction performance and robustness under limited data.

Methods

We build on a strong PET/CT survival modeling baseline based on a DeepMTS-style multi-task architecture with a Dynamic Affine Feature Map Transform (DAFT) block for image–tabular fusion. To this framework, we incorporate pretrained CT and PET foundation model encoders, which are kept frozen and used to extract fixed image embeddings without task-specific fine-tuning. The resulting embeddings (512-D for CT and 768-D for PET) are optionally compressed using lightweight linear projection layers and then combined by late fusion via concatenation immediately before the survival prediction head. Experiments are conducted on the public HECKTOR 2022 training split (524 PET/CT cases from 7 centers) with recurrence-free survival labels and 9 clinical covariates, using 5-fold stratified cross-validation. Performance is evaluated by mean±SD concordance index (C-index) and 1-year time-dependent AUROC.

Results

Relative to the baseline DeepMTS+DAFT model (C-index 0.616±0.082; 1-year AUROC 0.671±0.158), adding frozen FM embeddings consistently improved outcomes and reduced fold-to-fold variance. CT-FM achieved C-index 0.715±0.075 and 1-year AUROC 0.710±0.083, while PET-FM (best with 128-D projection) achieved C-index 0.736±0.060 and 1-year AUROC 0.712±0.100. Late fusion of CT+PET FM embeddings (best with 256-D projection) yielded the best performance (C-index 0.743±0.067; 1-year AUROC 0.744±0.094).

Conclusion

Frozen, modality-specific PET and CT foundation model embeddings provide stable and complementary prognostic signals for HNSCC survival modeling. Lightweight late fusion with simple projection heads improves both predictive performance and robustness without increasing training complexity, supporting the potential utility of FM-augmented multimodal models for clinically relevant risk stratification in limited-data settings.

People

Lise WeiPresenting Author · University of Michigan Haotian ZhangAuthors · University of Michigan, Ann Arbor Yue Cao, PhDAuthors · University of Michigan Liyue Shen, PhDAuthors · University of Michigan Michelle L. Mierzwa, MDAuthors · University of Michigan

Similar sessions

Poster Poster Program

Jul 19 · 07:00

Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD