Poster Poster Program Therapy Physics

Reinforcement Learning for Automated IMRT Treatment Planning: Mathematical Optimization of the Reward Function Design

Abstract

Purpose

Reinforcement learning (RL) constitutes a strong candidate for AI-guided treatment planning for two distinguishing reasons: it differs from greedy algorithms by optimizing strategy over the complete history of a Markovian process; and it contrasts with supervised learning by relying entirely on explorative interactions without a priori knowledge of the environment. The reward signal is the sole human input driving the agent, and is therefore critical to its outcomes. It needs to be meticulously designed to instill clinical preference and consideration on the derived treatment plan. We aim to optimize the reward function and improve RL-generated plan quality.

Methods

We implemented custom soft actor-critic (SAC) RL on in-house treatment planning system. We designed mathematical reward functions that coordinated the agent’s focus between organs-at-risk (OARs) of variable management difficulty. Thirty-eight head-and-neck IMRT cases (Rx: 44 Gy) were randomized into training and testing sets (n=19 each). Agents were trained to conduct plan optimization on the training set—directed by novel reward models—by making informed adjustments on the planning objectives. Performance was evaluated based on DVH metrics and 3D dose distributions of RL-generated plans on the testing set. Results were benchmarked against a re-implemented, previously published piecewise-linear model.

Results

Agents trained with quadratic and exponential rewards outperformed the piecewise-linear baseline. Average maximum PTV doses were 114.4% (quadratic) and 114.7% (exponential), comparable to the piecewise-linear baseline (114.2%). Average median dose decreased to 8.6 Gy and 9.0 Gy for parotids (baseline: 10.2 Gy), and 30.4 Gy for pharynx for both (baseline: 33.1 Gy). Other OARs remained comparable to the baseline.

Conclusion

RL agents trained with novel reward function designs achieved a 10% reduction in average median dose to select OARs, with no loss of PTV coverage or uniformity. Results demonstrated that quadratic and exponential functions are superior RL reward function models for head-and-neck treatment planning.

People

Mingyue ChenPresenting Author · Duke University Medical Center Dongrong YangAuthors · Duke University Medical Center Xin WuAuthors · Duke University Medical Center Qiuwen Wu, PhDAuthors · Duke University Medical Center Qingrong Jackie Wu, PhDAuthors · Duke University Medical Center Yang Sheng, PhDAuthors · Duke University Medical Center

Similar sessions

Poster Poster Program

Jul 19 · 07:00

Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD