Volumetric Modulated Arc Therapy Machine Parameter Optimization for Localized Prostate Cancer Using Tandem Reinforcement Learning Networks
Abstract
Purpose
Volumetric modulated arc therapy (VMAT) machine parameter optimization (MPO) is a high-dimensional problem traditionally reliant on computationally expensive inverse planning. While machine learning techniques have been developed to aide this process, they typically still rely on an optimization process and are severely limited by the scope of the training dataset. Reinforcement learning (RL) offers a mechanism to discover novel strategies through trial-and-error providing consistently high-quality plans. In this work, we develop and validate an RL-based VMAT MPO algorithm capable of generating clinically comparable prostate cancer treatment plans that satisfy machine constraints independent of a treatment planning system (TPS) optimizer.
Methods
Using a dataset of 100 prostate SBRT patients (PACE-B criteria), we developed an RL framework using a Proximal Policy Optimization (PPO) algorithm for VMAT plans on the Unity MR-Linac. Two tandem convolutional neural networks were trained to sequentially optimize multi-leaf collimator (MLC) positions and monitor units (MUs). The networks utilized current dose, contoured structure masks, and current machine parameters as inputs to maximize a dose-volume histogram (DVH)-based reward function. The final model was evaluated on a 20-patient test set and compared against reference plans from a commercial TPS.
Results
The RL algorithm generated plans in an average of 6.3 ± 4.7 seconds and showed a statistically significant increase in PTV hotspot and right femoral head dose, and a significant reduction in mean rectal dose compared to TPS generated reference plans. All RL-generated plans satisfied the clinical objectives of the reference plans and met the MLC leaf motion and dose rate constraints.
Conclusion
We demonstrated the feasibility of an RL framework to automate VMAT MPO. The system rapidly produces prostate treatment plans that are dosimetrically comparable to manual optimization without requiring a commercial TPS and offer a path toward reduced planning times and enhanced plan quality.