Automatic Segmentation of Post-Operative Head and Neck Primary CTV In a Multimodal Framework
Abstract
Purpose
Automating post-operative primary clinical target volume (CTV) segmentation in head and neck (H&N) cancers is challenging due to the surgical absence of the primary tumor and anatomical heterogeneity. Without the distinct radiographic boundaries of a gross tumor, CT imaging alone is often insufficient. To mimic expert decision-making, we propose a multimodal framework that incorporates clinical metadata to enhance post-operative H&N CTV segmentation, evaluating the feasibility and value of clinical context integration.
Methods
Our framework utilized a two-stage approach. First, a text-image model uses tumor site and laterality to identify the broad anatomical region of interest. Second, we fine-tuned the model by adding dose level, CTV names, 5 organs-at-risk (mandible, constrictor muscle, pharynx, parotid glands, submandibular glands), and pre-operative gross tumor volume (GTV) masks. To mitigate data scarcity, we trained our model on both pre- and post-operative H&N datasets to leverage shared anatomical features. The dataset included multiple sub-sites (tonsil, base of tongue, and parotid) with varying laterality. We collected 115 cases (85/10/20 for train/validation/test) to develop the first model, and another 123 cases (98/7/18) to develop the second-stage tonsil model.
Results
We compared our method with a baseline model that used CT images, OARs, and pre-op GTV masks as input, but without disease site and dose level information. The proposed framework achieved a Dice Similarity Coefficient (DSC) of 60.7% on tonsil CTV, while the baseline model achieved only 41.7%.
Conclusion
The integration of clinical context resulted in a marked improvement over image-only baselines, demonstrating the feasibility and necessity of metadata for post-operative H&N segmentation. While current performance is not yet clinical-ready with possible reasons including limited training data size, incomplete information on microscopic tumor spread, and inter-physician contour variability, this work validates the direction of richer context integration for future developments.