Poster Poster Program Therapy Physics

Multimodal Cross-Attention-Based Predictive Model with Self-Attention Refinement for Lymphedema Risk Following Breast 3D-CRT Using 3D CT and Voxel-Wise Dose Distributions

Abstract
Purpose

To mitigate the loss of spatial information inherent to DVH-based and coarse categorical descriptors used for breast cancer–related lymphedema prediction, we present a multi-modal cross-attention predictive model with self-attention refinement using patient-specific 3D CT anatomy and dose distributions

Methods

We developed a voxel-level multimodal architecture comprising modality-specific 3D DenseNet encoders for CT and dose, an intra-modality self-attention module for representation refinement, and a bidirectional cross-attention fusion module to explicitly learn anatomical–dosimetric interactions (CT and dose). To isolate architectural contributions, we performed controlled benchmarking of intra-modality self-attention variants (bottleneck attention module (BAM), convolution block attention module (CBAM), multi-head self-attention (MHSA), and a vision transformer using SwinUNETR) under identical fusion settings. Inputs were standardized via axillary lymph node–centered alignment to ensure consistent inclusion of axillary Levels I–III within a fixed-depth volume, producing 512x512x64 3D volumes. The framework was evaluated on 119 right-sided breast cancer patients treated with 3D conformal radiotherapy using five repeated patient-level splits. Ablation analyses quantified the contribution of the best-performing self-attention variant and assessed the necessity of interaction modeling by removing cross-attention.

Results

Across controlled comparisons, BAM provided the most stable and discriminative performance. The full configuration (DenseNet3D+BAM+bidirectional cross-attention) achieved a mean AUC of 0.770 and a mean accuracy of 0.712. Relative to the DenseNet3D baseline with cross-attention (mean AUC 0.636; mean accuracy 0.578), BAM substantially improved discrimination. In contrast, removing cross-attention markedly degraded performance (mean AUC 0.518; mean accuracy 0.524), underscoring the importance of explicit CT–dose interaction learning. Grad-CAM and cross-attention visualizations consistently highlighted axillary lymph node Levels I–III and adjacent soft-tissue regions/dose patterns, supporting clinically plausible spatial focus.

Conclusion

Voxel-level integration of CT anatomy and 3D dose, coupled with systematic benchmarking of intra-modality self-attention and bidirectional cross-attention fusion, enables more accurate and interpretable lymphedema risk prediction beyond DVH-style summaries.

People

Related

Similar sessions

Poster Poster Program
Jul 19 · 07:00
Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD
Therapy Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
User Expectations and Current Availability of HDR Brachytherapy Audits In Europe

The aim of this work was to evaluate the need to implement more dosimetric audits in high‐dose‐rate brachytherapy (HDR-BT) in Europe and to identify which characteristics such audits should meet according to users.

Javier Vijande, PhD Laura Oliver Cañamás
Therapy Physics 0 people interested