Large-Scale Automatic Carbon Ion Treatment Planning for Head and Neck Cancers Via Parallel Multi-Agent Reinforcement Learning
Abstract
Purpose
Head-and-neck cancer (HNC) treatment planning is challenging due to the close proximity of multiple critical organs-at-risk (OARs) to complex target volumes. Intensity-modulated carbon-ion therapy (IMCT) is attractive for HNC due to superior dose conformity and OAR sparing, but its planning process is slow owing to additional modeling requirements such as relative biological effectiveness (RBE). Recent studies have applied deep learning (DL) and reinforcement learning (RL) to automate treatment planning, where DL-based methods often struggle with plan feasibility and optimality due to training data bias, while RL-based methods face challenges in efficiently exploring the large and exponentially complex TPP search space.
Methods
We propose a scalable MARL framework that directly addresses these bottlenecks and enables parallel tuning of 45 TPPs for IMCT. Technically, we adopt a centralized-training decentralized-execution (CTDE) QMIX backbone to stabilize learning in a high-dimensional, non-stationary environment. Additionally, to further improve practicality, we (1) use compact historical DVH vectors as state inputs, (2) introduce a linear action-to-value transformation that maps small discrete actions to uniformly distributed parameter adjustments, and (3) design an absolute, clinically informed piecewise reward aligned to a comprehensive plan scoring system; to improve sample efficiency, a synchronous multi-process data-worker architecture interfaces with the TPS for parallel plan optimization and accelerated data collection.
Results
On a head-and-neck dataset (10 training, 10 testing) the method tuned 45 parameters simultaneously and yielded plans comparable to or better than expert manual plans (relative plan score: RL 85.93±7.85% vs Manual 85.02±6.92%), showing significant (p-value<0.05) improvements for five OARs.
Conclusion
The results demonstrate the capability of the framework to efficiently search for high-dimensional TPPs and produce clinically competitive plans through direct TPS interaction especially for OARs.