Poster Poster Program Therapy Physics

Automated Prescription and Dose Goal Generation In a Clinical Treatment Planning System from Free-Text Physician Orders Using Frontier Large Language Models

Abstract

Purpose

To develop and evaluate an automated approach to intelligently interpret free-text, or natural language (NL), physician treatment directives and generate patient-specific prescriptions and clinical goals within a commercial treatment planning system (TPS) using large language models (LLMs), and compare the accuracy, time, and cost between major frontier LLMs, including ChatGPT and Claude.

Methods

We developed a fully-automated multi-agent LLM framework that accepted NL directives as input, extracted prescription and clinical goals through multiple steps of interpretation, and automatically entered them into our TPS (RayStation 2023B). The approach was evaluated in 5 GU and CNS patients who had previously received intensity-modulated proton therapy, each with unique archived NL physician directives. An experienced planner manually re-generated clinical goal lists for each directive, serving as ground truth. The LLM approach was compared with the clinical, human-generated goal list in terms of agreement with this ground truth. The accuracy, time, and inference cost of multiple frontier LLMs were compared, including ChatGPT-4o-mini, ChatGPT-4o, and Claude Opus 4, and accuracy was compared to the clinical plans using paired t-tests.

Results

The mean±standard deviation (SD) percent of correct goals for clinical plans was 82.2±14.4%, attributed to a combination of human error and intentional modification of goals during clinical planning. The mean±SD (p-value) percent of correct goals for 4o-mini, 4o, and Claude was 47.6±26.5% (0.039), 87.5±8.9% (0.236), and 96.0±4.4 (0.043), respectively. Mean±SD inference time per patient was 23.1±12.1 s, 24.2±12.1 s, and 224.7±148.5 s, and mean±SD inference cost per patient was $0.0042±0.0021, $0.059±0.031, and $0.658±0.415, respectively.

Conclusion

An LLM approach for interpreting NL physician treatment directives was developed and demonstrated better agreement with initial directives than final clinical plans. Among LLMs, the most costly reasoning model achieved the highest accuracy, and could enable intuitive and efficient auto-planning for complex and unique patients.

People

Sizhuo MengPresenting Author · Johns Hopkins University Curtiland Deville, MDAuthors · Johns Hopkins University Matthew Ladra, M.D., M.P.H.Authors · Johns Hopkins University Xun Jia, PhDAuthors · Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University Heng Li, PhDAuthors · Johns Hopkins University William T. Hrinivich, PhDAuthors · Johns Hopkins University

Similar sessions

Poster Poster Program

Jul 19 · 07:00

Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD