Paper Proffered Program Therapy Physics

Augmenting Patient Safety Surveillance In Radiation Oncology with Large Language Model-Based Root Cause Analysis

Abstract

Purpose

To evaluate the reasoning capabilities of large language models (LLMs) in performing root cause analysis (RCA) of radiation oncology incidents using narrative reports from the Radiation Oncology Incident Learning System (RO-ILS), and to assess their potential utility in supporting patient safety efforts.

Methods

We prompted four state-of-the-art LLMs, Gemini 2.5 Pro, GPT-4o, o3, and Grok 3, with the “Background and Incident Overview” sections from 19 publicly available RO-ILS cases. Each model was instructed to perform RCA and generate root causes, lessons learned, and suggested actions using a standardized prompt based on AAPM RCA guidelines. Model outputs were evaluated using a combination of objective semantic similarity metrics (cosine similarity via Sentence Transformer), semi-subjective assessments (precision, recall, F1-score, accuracy, hallucination rate and performance criteria including relevance, comprehensiveness, quality of justification and quality of solution), and subjective ratings (reasoning quality and overall performance) by five board-certified medical physicists.

Results

LLMs demonstrated satisfactory performance across evaluation metrics. GPT-4o achieved the highest cosine similarity (0.831), and Gemini 2.5 Pro had the highest recall (0.799) and accuracy (0.918). All models exhibited some degree of hallucination, ranging from 11% to 61%. Gemini 2.5 Pro, which outperformed all other models across performance evaluation criteria, received an overall performance rating of 4.8 out of 5 from expert reviewers. Statistically significant differences were observed among models in accuracy, hallucination rate, and subjective ratings (p < 0.05).

Conclusion

LLMs delivered promising results as assistive tools for RCA in radiation oncology, with the ability to generate relevant and accurate analyses aligned with expert expectations. LLMs may support incident analysis and contribute to quality improvement efforts to advance patient safety in clinical radiation oncology practice.

People

Yuntao WangPresenting Author · Department of Radiation Oncology, University of Miami Yunze YangCorrespondings · Department of Radiation Oncology, University of Miami Matthew T. Studenski, PhDAuthors · Department of Radiation Oncology, University of Miami Siamak P. Nejad-DavaraniAuthors · Department of Radiation Oncology, University of Miami Maria de la Luz De Ornelas, PhD, MSAuthors · Department of Radiation Oncology, University of Miami Elizabeth L. Bossart, PhDAuthors · Department of Radiation Oncology, University of Miami

Similar sessions

Poster Poster Program

Jul 19 · 07:00

Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD