Automated Dose-Based Anatomic Region Classification of Radiotherapy Treatment for Big Data Applications
Abstract
Purpose
Aggregating large radiotherapy datasets for predictive modeling is impeded by inconsistent nomenclature. To address this, we developed a fully automated, metadata-independent framework to classify treatment plans into six Anatomic Regions and identify specific Target Organs (e.g., Brain, Prostate) using deep-learning autosegmentation and geometric dose-volume metrics.
Methods
The workflow operates independently of plan names. It utilizes TotalSegmentator and an in-house breast model to autosegment 118 structures on the planning CT. A hierarchical decision tree assigns a primary Anatomic Region based on the volumetric intersection of these structures with the high-dose (V85%) and mid-dose (V50%) volumes. The framework further predicts specific Target Organs by benchmarking dose overlap against a dictionary of site-specific rules. The Anatomic Region algorithm was validated on an independent 100-case test cohort, while the experimental Target Organ classifier was evaluated on the 109-case development set.
Results
In the test cohort (n=100), the algorithm achieved a Top-1 Accuracy of 95% for Anatomic Region classification, with perfect sensitivity (1.0) for Abdomen, Extremity, Thorax, Cranial, and Head-and-Neck regions. Exact Accuracy, which requires matching all primary and secondary labels in the correct order, was 91.0%, while Order-Independent Accuracy was 94.0%. Benchmarking of the Target Organ classifier on the development cohort (n=109) established a baseline accuracy of 65.1%, showing robust identification of distinct organs (Brain: 1.0) relative to complex overlapping regions. The optimized pipeline achieved a throughput of 181 seconds per plan using a single GPU-accelerated workstation, with linear scalability across multiple nodes.
Conclusion
This automated system demonstrates high robustness for regional classification and establishes a promising baseline for granular target identification. By bypassing text-based metadata, this workflow provides a scalable solution for curating multi-institutional datasets. Future work is focused on refining the target organ logic to match the high accuracy of the regional labels.