Poster GPD-T-155 Poster Program Therapy Physics

Multimodal Cross-Attention-Based Predictive Model with Self-Attention Refinement for Lymphedema Risk Following Breast 3D-CRT Using 3D CT and Voxel-Wise Dose Distributions

Poster

SJH_AAPM_Poster_V4.pdf Hosted by AAPM · 0.9 MB

Abstract

Purpose

To mitigate the loss of spatial information inherent to DVH-based and coarse categorical descriptors used for breast cancer–related lymphedema prediction, we present a multi-modal cross-attention predictive model with self-attention refinement using patient-specific 3D CT anatomy and dose distributions

Methods

We developed a voxel-level multimodal architecture comprising modality-specific 3D DenseNet encoders for CT and dose, an intra-modality self-attention module for representation refinement, and a bidirectional cross-attention fusion module to explicitly learn anatomical–dosimetric interactions (CT and dose). To isolate architectural contributions, we performed controlled benchmarking of intra-modality self-attention variants (bottleneck attention module (BAM), convolution block attention module (CBAM), multi-head self-attention (MHSA), and a vision transformer using SwinUNETR) under identical fusion settings. Inputs were standardized via axillary lymph node–centered alignment to ensure consistent inclusion of axillary Levels I–III within a fixed-depth volume, producing 512x512x64 3D volumes. The framework was evaluated on 119 right-sided breast cancer patients treated with 3D conformal radiotherapy using five repeated patient-level splits. Ablation analyses quantified the contribution of the best-performing self-attention variant and assessed the necessity of interaction modeling by removing cross-attention.

Results

Across controlled comparisons, BAM provided the most stable and discriminative performance. The full configuration (DenseNet3D+BAM+bidirectional cross-attention) achieved a mean AUC of 0.770 and a mean accuracy of 0.712. Relative to the DenseNet3D baseline with cross-attention (mean AUC 0.636; mean accuracy 0.578), BAM substantially improved discrimination. In contrast, removing cross-attention markedly degraded performance (mean AUC 0.518; mean accuracy 0.524), underscoring the importance of explicit CT–dose interaction learning. Grad-CAM and cross-attention visualizations consistently highlighted axillary lymph node Levels I–III and adjacent soft-tissue regions/dose patterns, supporting clinically plausible spatial focus.

Conclusion

Voxel-level integration of CT anatomy and 3D dose, coupled with systematic benchmarking of intra-modality self-attention and bidirectional cross-attention fusion, enables more accurate and interpretable lymphedema risk prediction beyond DVH-style summaries.

People

Hojin Kim, PhDCorrespondings · Department of Radiation Oncology, Yonsei Cancer Center, Heavy Ion Therapy Research Institute, Yonsei University College of Medicine Jaehyun SeokPresenting Author · Department of Integrative Medicine Jee Suk Chang, MD PhDAuthors · Department of Radiation Oncology, Yonsei Cancer Center Jin Sung Kim, PhDAuthors · Department of Radiation Oncology, Yonsei Cancer Center, Heavy Ion Therapy Research Institute, Yonsei University College of Medicine