A Generalized Radiotherapy Auto-Planner Based on Machine-to-Machine In-Context Learning.
Abstract
Purpose
Inverse radiotherapy treatment planning involves solving a highly non-convex optimization problem where Large language model (LLM) shows promise for its strong interactive reasoning capabilities. We propose a machine-to-machine in-context learning framework where LLMs learn directly from a self-play system without human interaction and improve transferability across planning scenarios.
Methods
OpenAI’s GPT-4.0 iteratively received the current plan’s dose–volume histogram (DVH), plan evaluation criteria, and treatment planning parameter (TPP) settings, and proposed updated TPPs to an in-house TPS. Planning knowledge was provided using a final TPP solution space derived from a pre-trained reinforcement learning (RL) agent tested on 39 prostate cancer cases with 7-beam IMRT involving one planning target volume (PTV) and two organs at risk (OARs). The LLM planner’s plan scores and iteration counts were recorded with and without this guidance in three scenarios: (1) 7-beam prostate IMRT with 1 PTV and 2 OARs for 10 cases; (2) 180-beam prostate planning with 1 PTV and 4 OARs for one case; and (3) 7-beam liver planning with 1 PTV and 2 OARs for one case using 10 initial TPP settings.
Results
For Scenario 1, plan score went from 5.8 ± 1.7 to 7.9 ± 1.0 (maximum 9) in 18.2 ± 4.4 planning steps without guidance, and to 9.0 ± 0.0 in 3.5 ± 4.4 steps with guidance. In Scenario 2, plan scores improved from 10 to the maximum 15, requiring 12 and 5 steps without and with guidance, respectively. For Scenario 3, plan scores improved from 2.5 ± 0.4 to 3.1 ± 0.2 in 20.0 ± 0.0 steps without guidance, and to 3.7 ± 0.35 (maximum 4) in 12.8 ± 8.3 steps with guidance.
Conclusion
Planning knowledge learned from a single scenario can effectively guide GPT-based planners across diverse sites, beam configurations, and protocols, improving both planning quality and efficiency.