Using a Foundation-Model Based Feature Extractor for Distant Metastasis Prediction In Head and Neck Cancer
Abstract
Purpose
Early prediction of distant metastasis (DM) risk in head and neck cancer (HNC) can enable timely interventions that may improve treatment outcomes. While machine learning approaches using medical imaging have been widely explored for this task, many current methods rely on prior knowledge of the region of interest such as tumor segmentations, which require expert knowledge, is time-consuming and introduces user-dependent variability. Medical image-based foundation models have recently been developed for specific imaging modalities to streamline down-stream prediction tasks by extracting modality-relevant features. In this study, we evaluate the effectiveness of using a foundation model as the feature extractor from minimally processed image volumes to predict DM risk in HNC patients and compare its performance with traditional approaches that require prior knowledge on the regions of interest.
Methods
Preoperative CT images of 2327 patients from the RADCURE dataset were used. Three features-sets were created including radiomics, deep-learning based features, and third-party CT Foundation derived features. The feature-sets were used individually in a multi-layer perceptron (MLP) to predict DM risk. The final models for each feature-set were chosen using 5-fold cross validation and their performance was evaluated using a hold-out testing set.
Results
The model using CT Foundation embeddings outperformed the radiomics and deep learning-based models, achieving a Receiver Operating Characteristic Area Under the Curve (AUC) of 0.791, compared to AUC values of 0.772 and 0.753 for the radiomics and deep learning-based models, respectively. The CT Foundation based model had similar performance to a model that combined the use of radiomics and deep learning-based features that achieved an AUC of 0.794.
Conclusion
Features based on foundation models offer a promising alternative to traditional radiomics while reducing the need for domain expertise and extensively annotated datasets. Their minimal preprocessing requirements also make them a more accessible and scalable option.