BLUE RIBBON POSTER THERAPY: Simultaneous Radiobiological Response-Guided Automatic Treatment Planning and Physician Preference Quantification In High-Dose-Rate Brachytherapy of Cervical Cancer Via Inverse Deep Reinforcement Learning
Abstract
Purpose
Automatic treatment planning and physician’s plan preference quantification are both critical tasks in High-dose-rate brachytherapy (HDRBT) of cervical cancer for rapidly generating high-quality plans meeting physician’s intent. This study employs inverse deep reinforcement learning (iDRL) to simultaneously address two tasks by learning interpretable physician preferences based on radiobiological responses and using the learned preference model to guide automatic treatment planning.
Methods
The iDRL framework integrated DRL-based auto-planning with active reward learning. A planning agent trained using a DQN-based scheme iteratively adjusted organ weighting factors in an optimization engine based on observed plan to improve plan quality. Tumor control probability (TCP) and normal tissue complication probabilities (NTCPs) for bladder, rectum, sigmoid, and small bowel were computed using established radiobiological models. A logistic regression model was used to distinguish clinically approved and agent-generated plans in TCP/NTCP domain, generating plan approval probability that served as the reward function for the DRL agent. The reward model and planning agent were jointly trained. The framework was trained on 60 patients and evaluated on an independent cohort of 15 patients treated by the same physician.
Results
The iDRL framework was successfully trained. The weighting factors in the logistic regression model for TCP and NTCPs of bladder, rectum, sigmoid, and small bowel were 0.41, −0.11, −0.27, −0.30, −0.12. Odds-ratio analysis identified TCP as the dominant driver of plan acceptance and greatest sensitivity to sigmoid and rectum toxicity. On the independent test cohort, plans generated by the trained agent achieved radiobiological quality comparable to clinical plans, with TCP 0.92±0.02 (clinical: 0.93±0.01), NTCPbladder 0.05±0.01 (0.05±0.01), NTCPrectum 0.09±0.01 (0.11±0.02), NTCPsigmoid 0.10±0.03 (0.11±0.03), and NTCPbowel 0.06±0.03 (0.07±0.02).
Conclusion
The iDRL framework for HDRBT treatment planning in cervical cancer effectively bridges physician preference modeling and automatic plan generation, enabling these two tasks to be addressed simultaneously within a clinically meaningful framework.