Consensus-Based Guardrails Improve Robustness of Deep Learning–Driven Respiratory Phase Prediction for Gated Radiotherapy
Abstract
Purpose
To evaluate consensus-based guardrails for improving the robustness of an in-house deep learning Long Short-Term Memory (LSTM) respiratory phase prediction model used for respiratory-gated radiotherapy. Consensus filter configurations were compared based on their impact on gate signal ringing, beam-on/off timing errors, and treatment delivery efficiency.
Methods
Ten anonymized patient respiratory traces were analyzed. Ground-truth respiratory phases were determined via expert retrospective peak identification and compared with predictions from an in-house LSTM model generating phase forecasts spanning 33–333ms. Two consensus-based guardrails were implemented, each requiring 2-of-3 agreement to change gate state. The Staggered method formed consensus across sequential predictions, requiring new amplitude data between votes, while the Convergent method combined three predictions converging to a single future time point (99ms ahead). Each method was evaluated independently and in combination with a 200ms dead-time filter applied to the gate signal. Simulated gating signals derived from ground truth were compared with consensus-filtered signals to quantify ringing prevalence, gate timing errors, and treatment delivery efficiency, assessed via duty cycle for a 20–80% phase-gating window.
Results
Both consensus methods significantly reduced gate signal ringing. Compared to 2.2% ringing without guardrails, ringing decreased to 1.2% with the Convergent method and 0.4% with the Staggered method, with further reductions to 0.2% and 0.1% following dead-time hybridization. Treatment delivery efficiency was minimally affected, with duty cycles of 59.6% (Convergent) and 60.5% (Staggered), compared to the expected 60%. Mean gate-ON timing errors increased modestly to approximately 10–11ms, and gate-OFF timing errors increased to approximately 51ms; all delays were clinically insignificant.
Conclusion
Consensus guardrails reduce gate signal ringing in respiratory phase prediction with negligible impact on delivery efficiency or timing accuracy. Dead-time provides additional robustness, offering a practical and generalizable strategy to improve the clinical reliability of deep learning–based respiratory gating systems.