Large Scale Analysis of Practice Patterns Emerging from an Automated Real-Time Evaluation of a Commercial Artificial Intelligence Auto-Contouring Solution
Abstract
Purpose
To evaluate real-time, longitudinal performance of a commercial auto-segmentation model across a broad radiotherapy cohort and examine contour editing influence on planning decisions and organ-at-risk (OAR) doses, with attention to practice patterns, disease-site variation, and clinician adoption over time.
Methods
AI-Evaluator was developed using a C#/.NET-based automation integrating Varian’s Eclipse Scripting API, clinical SQL queries and DICOM protocols. The tool was incorporated into the clinical workflows and provides an automated comparison of AI vs. Clinician contours including, geometric, dosimetric and plan and patient level parameters. The study analyzed 36,332 records from 2018 unique plans. A breast-focused subset comprised 2,939 structures with 69% of records from two site specialists, and Relative Mean Clinical Dose (rMCD) > 80% used to infer planning intent for analysis of planning target contours (PTCs).
Results
15% of all structures were modified; modifications correlated with higher dose and inclusion in optimization (p < 0.0001). Average Dice scores increased 0.881 vs 0.902 post software upgrade (p < 0.001). Clinician adoption rose from 20% at implementation to ~80% over 22 months. In the breast subset, PTC edits occurred more often than non-PTCs (46% vs 8%, p<0.0001); 60% of PTC lymph nodes (LNs) were modified. Left-sided PTC LNs + right-breast edits reduced heart mean dose and left anterior descending artery (ALAD) mean dose by ~1% (p<0.05); left-breast edits increased heart/ALAD mean dose (p<0.0001). Experts edited PTCs more often (52% vs 31%, p=0.0005) and smaller structures on average (P<0.0001).
Conclusion
Structures receiving a higher dose or involved in inverse optimization experienced more scrutiny during treatment planning, with notable dose changes linked to contour edits, sometimes counter-intuitively. Results indicate physicians’ general agreement with AI-generated contours, which presents high potential for machine-learning based treatment planning advancements in the field. This analysis also highlights the need for standardized practices across disease sites.