Geometric Accuracy Is a Poor Predictor of Dosimetric Reliability: A Multi-Modality Analysis of Inter-Observer Variability In Pelvic Radiotherapy
Abstract
Purpose
To determine if geometric metrics (Dice Similarity Coefficient, DSC) are reliable predictors of dosimetric consistency.
Methods
Data from five pelvic radiotherapy cases were utilized, selected to represent the anatomical diversity in a 750-patient cohort, with contours generated by multiple expert observers. For SBRT (CyberKnife), a gold-standard plan was optimized on a consensus contour and applied to all observer contours to assess plan robustness. For VMAT, a cross-validation matrix approach was employed, where plans optimized for every observer were re-calculated on every other observer’s anatomy. Dosimetric variations for Target (CTV D95%, D98%) and OARs (Bladder/Rectum V40Gy, Dmax) were analyzed. The correlation between geometric agreement (DSC) and dosimetric deviation was evaluated.
Results
Geometric agreement was generally high, yet this did not translate to dosimetric stability. In the SBRT cohort, the CTV D95% coverage averaged 34.2 ±1.8 Gy across observers, while D98% showed greater variability (31.9 ±2.44 Gy). Although the bladder demonstrated excellent geometric agreement (DSC 0.95 ±0.03), the D2% varied considerably (35.72 ±1.25 Gy), and the correlation between DSC and high-dose volume (V40Gy) was weak (R^2=0.11). In the VMAT cross-validation analysis, geometric overlap was moderate to high (CTV DSC 0.84 ±0.05; Bladder DSC 0.93 ±0.02). The CTV D95% demonstrated a standard deviation of ±0.3 Gy across observers. Crucially, there was negligible correlation between geometric accuracy and target coverage (CTV DSC vs D95%R^2=0.05). Similarly, the Rectum (DSC 0.80 ±0.10) showed substantial dosimetric spread, with the V40Gy varying by a standard deviation of±5.9%.
Conclusion
Current geometric QA metrics (Dice) are insufficient proxies for dosimetric quality. The observed standard deviation in OAR dose (e.g., VMAT Rectum V40Gy SD±5.9%) is of a magnitude comparable to clinical dose-toxicity gradients. These findings suggest that inter-observer variability is sufficient to "wash out" dose-effect relationships in predictive modelling, highlighting the need for robust, probabilistic planning strategies.