Head and Neck Gross Tumor Volume Deep-Learning Autocontouring Model Using Bigroc Collaborative Data
Abstract
Purpose
Accurate delineation of Head and Neck (H&N) Gross Tumor Volumes (GTV) is a prerequisite for effective radiotherapy, represents a significant cognitive challenge, and may contribute to an observed outcomes deficit between high-volume and low-volume H&N radiotherapy providers. While deep learning offers a potential solution, models require diverse, multi-institutional datasets to achieve clinical generalizability. This study evaluates the initial implementation of the BigROC collaboration dataset, a cloud-based repository designed to overcome the limitations of single-institutional cohorts.
Methods
We employed the MedNeXt architecture with multi-modal inputs (CT, T1/T2 MRI, PET). Model Training was performed on Amazon Web Services (AWS), where the collaboration data is centralized. We compared a local model against a multi-institutional model. At this stage, the dataset includes two institutions: the first providing the baseline cohort (137 training, 16 validation, 18 test); the second institution provided the additional cases required to reach the multi-institutional total (reaching 218 training, 25 validation, and 28 test).
Results
The local model exhibited severe overfitting, with a mean Dice Similarity Coefficient (DSC) of 0.70 in training but dropping to 0.14 in validation and 0.05 in testing. The multi-institutional model achieved a modestly improved result, yielding a mean DSC of 0.52 for training 0.30 for validation, and 0.19 for testing. While these initial figures represent a challenging start, underscoring the inherent difficulty of H&N segmentation, the improvement in testing DSC indicates improved stability and generalizability. Analysis of the results revealed a substantial number of cases with a DSC of 0.0, suggesting occasional localization failure. Identifying underlying features of these cases is a primary focus for improving future model performance.
Conclusion
This study demonstrated the first successful implementation of the BigROC dataset for H&N GTV autocontouring. We anticipate that continued data accrual from additional institutions and further pipeline optimization will yield substantial gains in segmentation accuracy.