Poster GPD-T-602 Poster Program Therapy Physics

A Hybrid Prompt–Guided Multimodal Framework for Automatic Clinical Target Volume Delineation In Esophageal Cancer: A Multicenter Study

Abstract

Purpose

Accurate clinical target volume (CTV) delineation is essential for radiotherapy in esophageal cancer but remains highly subjective and variable due to its reliance on physician experience. Most automated approaches focus on gross tumor volume (GTV) and do not adequately address CTV-related clinical complexity. This study aimed to develop and validate a hybrid prompt–guided multimodal framework for automatic CTV delineation in esophageal cancer.

Methods

This retrospective multicenter study included PET/CT-based esophageal cancer patients from internal center (n = 262) and an independent external center (n = 49). A hybrid prompt–guided ContextUNETR+ framework was proposed, integrating three complementary prompts: PET-derived image prompts, structural prompts from primary and nodal GTVs (GTVp + GTVn), and textual prompts from clinical baseline information. CT images, image prompts, and structural prompts were concatenated at the channel level and fed into the visual encoder, while textual features were extracted using a LLaMA3 large language model. Multimodal feature alignment was achieved via a Two-Way Transformer, followed by decoder-based CTV prediction. Model performance was evaluated using five-fold cross-validation in the internal cohort, and the best-performing model was further assessed in the external validation cohort. Dice similarity coefficient (DSC) and 95% Hausdorff distance (HD95) were used for evaluation.

Results

Across five-fold cross-validation in the internal cohort, the proposed method achieved mean DSC values ranging from 75.34% to 78.58% and HD95 values from 19.06 to 27.82 mm. External validation demonstrated robust generalization, with a DSC of 71.10% and an HD95 of 25.42 mm. The proposed framework consistently outperformed comparison models across all evaluations.

Conclusion

The hybrid prompt–guided ContextUNETR+ framework enables effective integration of multimodal imaging, structural priors, and clinical textual information for automated CTV delineation in esophageal cancer. This approach demonstrates stable multicenter performance and has the potential to reduce inter-observer variability and support standardized target delineation in clinical radiotherapy practice.

People

Ziqi AnPresenting Author · Xijing Hospital Hongfei SunAuthors · Xijing Hospital Lina ZhaoCorrespondings · Department of Radiation Oncology, Xijing Hospital, Air Force Medical University

Similar sessions

Poster Poster Program

Jul 19 · 07:00

Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD