Paper Proffered Program Therapy Physics

Augmenting Patient Safety Surveillance In Radiation Oncology with Large Language Model-Based Root Cause Analysis

Abstract
Purpose

To evaluate the reasoning capabilities of large language models (LLMs) in performing root cause analysis (RCA) of radiation oncology incidents using narrative reports from the Radiation Oncology Incident Learning System (RO-ILS), and to assess their potential utility in supporting patient safety efforts.

Methods

We prompted four state-of-the-art LLMs, Gemini 2.5 Pro, GPT-4o, o3, and Grok 3, with the “Background and Incident Overview” sections from 19 publicly available RO-ILS cases. Each model was instructed to perform RCA and generate root causes, lessons learned, and suggested actions using a standardized prompt based on AAPM RCA guidelines. Model outputs were evaluated using a combination of objective semantic similarity metrics (cosine similarity via Sentence Transformer), semi-subjective assessments (precision, recall, F1-score, accuracy, hallucination rate and performance criteria including relevance, comprehensiveness, quality of justification and quality of solution), and subjective ratings (reasoning quality and overall performance) by five board-certified medical physicists.

Results

LLMs demonstrated satisfactory performance across evaluation metrics. GPT-4o achieved the highest cosine similarity (0.831), and Gemini 2.5 Pro had the highest recall (0.799) and accuracy (0.918). All models exhibited some degree of hallucination, ranging from 11% to 61%. Gemini 2.5 Pro, which outperformed all other models across performance evaluation criteria, received an overall performance rating of 4.8 out of 5 from expert reviewers. Statistically significant differences were observed among models in accuracy, hallucination rate, and subjective ratings (p < 0.05).

Conclusion

LLMs delivered promising results as assistive tools for RCA in radiation oncology, with the ability to generate relevant and accurate analyses aligned with expert expectations. LLMs may support incident analysis and contribute to quality improvement efforts to advance patient safety in clinical radiation oncology practice.

People

Related

Similar sessions

Poster Poster Program
Jul 19 · 07:00
Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD
Therapy Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
User Expectations and Current Availability of HDR Brachytherapy Audits In Europe

The aim of this work was to evaluate the need to implement more dosimetric audits in high‐dose‐rate brachytherapy (HDR-BT) in Europe and to identify which characteristics such audits should meet according to users.

Javier Vijande, PhD Laura Oliver Cañamás
Therapy Physics 0 people interested