Optimal Gamma Criteria for Clinically Relevant Error Detection In Lung Radiotherapy with Epid‑Based in vivo Dosimetry
Abstract
Purpose
The gamma index is widely used for comparing 2D and 3D dose distributions in pre‑treatment and in vivo patient‑specific quality assurance (PSQA). However, commonly applied criteria (e.g., 3%/3 mm, 3%/2 mm) often lack sensitivity in detecting clinically relevant delivery errors. The challenge is even greater for in vivo dosimetry, where no universally accepted gamma criteria exist and pre‑treatment thresholds have shown poor sensitivity and specificity. Although statistical process control can derive detector‑ and site‑specific thresholds, guidance on optimal distance‑to‑agreement (DTA) and dose‑difference (DD) parameters remains limited. This study aims to identify gamma criteria that maximize sensitivity and specificity for in vivo error detection using receiver operating characteristic (ROC) analysis, with clinically significant errors defined by treatment‑planning simulations and PTV DVH deviations.
Methods
Nine lung plans (six 3DCRT, three VMAT) were generated on a thoracic phantom. Delivery errors, setup misalignments, and anatomical variations were introduced at multiple magnitudes. Scenarios were simulated in the TPS to classify clinically relevant errors, defined as PTV dose degradation exceeding ΔD₂% or ΔD₉₈% > 10%, or ΔD₅₀% > 5%. All plans were delivered, and doses reconstructed using EPID in vivo dosimetry. Two‑ and three‑dimensional gamma analyses were performed using 2%/2 mm, 3%/2 mm, and 5%/2 mm criteria. ROC curves were generated using TPS‑based classification; 3DCRT and VMAT were analyzed separately. AUC quantified diagnostic performance.
Results
For 3DCRT, AUC values showed minimal dependence on gamma criteria (2D: 0.84–0.86; 3D: 0.85–0.88). In VMAT, criterion dependence was stronger: AUC values were 0.62 (2D/3D) at 2%/2 mm, increased modestly at 3%/2 mm, and reached 0.82 at 5%/2 mm.
Conclusion
EPID‑based in vivo gamma performance depends strongly on treatment technique. In 3DCRT, all criteria performed similarly, while in VMAT the 5%/2 mm criterion provided superior error discrimination. Across all conditions, 2D and 3D gamma analyses showed comparable performance.