Cross-Attention Based Multi-Modal Deep Learning Models for Predicting Local Recurrence In Non-Small Cell Lung Cancer
Abstract
Purpose
To develop cross-attention-based multi-modal deep learning models and to preliminarily validate their performance for predicting local recurrence (LR) in patients with non-small cell lung cancer (NSCLC).
Methods
A cohort of 100 patients with NSCL was used to train and validate DL models with five-fold cross validation. Multi-modal inputs consisted of four dimensional computed tomography (4DCT), region of interested (ROI) masks, and dose distributions. The models’ reliance on the input data was tested by omitting each input modality independently. Identical models using three dimensional computed tomography (3DCT) instead of 4DCT were also developed for comparison.
Results
The best area under the receiver operating characteristic curve (AUC) and area under the precision-recall curve (AUPRC) were observed in the comprehensive model integrating 4DCT, ROI masks, and dose maps (mean AUC = 0.86, mean AUPRC = 0.71). Models incorporating 4DCT consistently outperformed those using 3DCT (AUC: 0.79-0.86 vs. 0.78-0.79; AUPRC: 0.63-0.71 vs. 0.56-0.61). The dose-only model yielded a mean AUC of 0.78 and a mean AUPRC of 0.54.
Conclusion
Multi-modal integration enhances model performance compared with single-modal approaches for predicting LR in patients with NSCLC. Furthermore, incorporating 4DCT data further improves predictive accuracy compared with conventional 3DCT.