Feasibility of a Reinforcement-Learning Based Framework for Patient Specific Needle Placement and Dwell Time Optimization In Gynecologic HDR Brachytherapy
Abstract
Purpose
While reinforcement-learning (RL) based needle placement optimization has been explored in prostate brachytherapy, its application to gynecologic high dose rate (HDR) brachytherapy remains limited. This work presents a reinforcement learning based framework for patient-specific interstitial needle placement and dwell time optimization in gynecologic brachytherapy.
Methods
A two-stage RL framework was implemented in Python using an OpenAI Gym compatible environment and Proximal Policy Optimization Algorithm with Stablebaselines3. In the first stage, an agent predicts anatomically feasible tandem and needle trajectories using patient-specific geometry and organ at risk (OAR) avoidance constraints. In the second stage, dwell times are optimized using a continuous-action policy guided by clinically relevant dose–volume objectives. Dose deposition was calculated using the TG-43 formalism with an Ir-192 HDR source. Reward functions incorporated CTVHR coverage metrics (D90, D98), OAR sparing metrics (D2cc for rectum, bladder, sigmoid, bowel, and vagina), high-dose control (V200), and dwell time modulation constraints to discourage undesirable dwell distributions. The algorithm was trained on 4 patient datasets and evaluated on 3 additional cases planned for 550cGy fractional dose.
Results
The framework successfully generated anatomically feasible tandem/needle trajectories and corresponding dwell time distributions for all evaluated cases. Across cases, the agent achieved moderate CTVHR coverage with D90 values ranging from 457.5 to 740.1 cGy. High dose regions remained controlled with V200 values below 21%, within the goal of less than 35%. OAR sparing was consistently achieved across all patients. DVHs demonstrated smooth dose falloff and absence of extreme hotspots.
Conclusion
This work establishes a reinforcement learning framework for gynecologic HDR brachytherapy treatment planning and provides insight into optimization tradeoffs encountered during early-stage learning. Ongoing work focuses on reward refinement and policy stabilization to improve CTVHR coverage while maintaining acceptable OAR doses, supporting future clinical translation.