Poster Poster Program Therapy Physics

A Hybrid Prompt–Guided Multimodal Framework for Automatic Clinical Target Volume Delineation In Esophageal Cancer: A Multicenter Study

Abstract
Purpose

Accurate clinical target volume (CTV) delineation is essential for radiotherapy in esophageal cancer but remains highly subjective and variable due to its reliance on physician experience. Most automated approaches focus on gross tumor volume (GTV) and do not adequately address CTV-related clinical complexity. This study aimed to develop and validate a hybrid prompt–guided multimodal framework for automatic CTV delineation in esophageal cancer.

Methods

This retrospective multicenter study included PET/CT-based esophageal cancer patients from internal center (n = 262) and an independent external center (n = 49). A hybrid prompt–guided ContextUNETR+ framework was proposed, integrating three complementary prompts: PET-derived image prompts, structural prompts from primary and nodal GTVs (GTVp + GTVn), and textual prompts from clinical baseline information. CT images, image prompts, and structural prompts were concatenated at the channel level and fed into the visual encoder, while textual features were extracted using a LLaMA3 large language model. Multimodal feature alignment was achieved via a Two-Way Transformer, followed by decoder-based CTV prediction. Model performance was evaluated using five-fold cross-validation in the internal cohort, and the best-performing model was further assessed in the external validation cohort. Dice similarity coefficient (DSC) and 95% Hausdorff distance (HD95) were used for evaluation.

Results

Across five-fold cross-validation in the internal cohort, the proposed method achieved mean DSC values ranging from 75.34% to 78.58% and HD95 values from 19.06 to 27.82 mm. External validation demonstrated robust generalization, with a DSC of 71.10% and an HD95 of 25.42 mm. The proposed framework consistently outperformed comparison models across all evaluations.

Conclusion

The hybrid prompt–guided ContextUNETR+ framework enables effective integration of multimodal imaging, structural priors, and clinical textual information for automated CTV delineation in esophageal cancer. This approach demonstrates stable multicenter performance and has the potential to reduce inter-observer variability and support standardized target delineation in clinical radiotherapy practice.

People

Related

Similar sessions

Poster Poster Program
Jul 19 · 07:00
Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD
Therapy Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
User Expectations and Current Availability of HDR Brachytherapy Audits In Europe

The aim of this work was to evaluate the need to implement more dosimetric audits in high‐dose‐rate brachytherapy (HDR-BT) in Europe and to identify which characteristics such audits should meet according to users.

Javier Vijande, PhD Laura Oliver Cañamás
Therapy Physics 0 people interested