Automating Trial-and-Error In Prostate VMAT Planning with DreamerV3
Abstract
Purpose
Radiotherapy treatment plan optimization depends on planner experience and a trial-and-error process. Supervised deep learning can improve efficiency, but when trained on historical clinical plans, its performance is limited by their quality. Deep reinforcement learning (DRL) improves decisions through exploration and reward feedback, enabling strategies that balance PTV coverage and OAR sparing without relying on prior clinical plans. We aimed to develop an automated framework for treatment planning optimization using DreamerV3, a model-based DRL algorithm that predicts future rewards through a learned world model.
Methods
We targeted prostate VMAT in Eclipse. Using the Eclipse API (PyESAPI), we generated training plans by creating multiple plans, modifying planning parameters (TPPs) and dose constraints, running optimization, and computing dose. Dosimetric indices were extracted and converted to loss functions and rewards to construct state–action–reward tuples for constraint updates. DreamerV3 was trained to learn a world model that predicts rewards for sequential TPP updates and a policy that selects parameter changes to maximize the predicted return. After training, the policy inferred optimized constraints from initial TPP values for held-out test patients; the inferred constraints were imported back into Eclipse to generate and evaluate new plans.
Results
Among 10 prostate test patients, inference on one representative case (prescribed dose, 78 Gy) executed up to 30 sequential constraint updates in approximately 15 s. After importing the predicted constraint set into Eclipse and recalculating dose, PTV Dmax, Dmin, and Dmean were 81.9, 66.7, and 76.0 Gy, respectively. Bladder excluding PTV Dmax, and Dmean were 51.7, and 13.9 Gy, respectively. Rectum excluding PTV Dmax, and Dmean were 51.2, and 20.1 Gy, respectively.
Conclusion
Our system, which incorporates the DRL model DreamerV3, is integrated with Eclipse via PyESAPI to continuously explore dose constraint settings and generate VMAT plans without relying on historical clinical plans.