Poster Poster Program Therapy Physics

Large Language Model-Guided Triage of Incident Learning System Forms In Radiation Oncology

Abstract
Purpose

To evaluate agreement between LLMs and expert reviewers in triaging radiation oncology incident learning system (ILS) forms with regard to three clinically relevant dimensions (workflow process step, severity, and dosimetric impact), with the goal of improving efficiency and effectiveness of routine ILS review/follow-up for larger institutions.

Methods

Thirty-nine free text ILS forms from a single institution (2024-2025) were independently reviewed by three expert observers and three open-source LLMs (GPT-oss120b, Llama3.1:70b, Qwen2.5:32b). Each reviewer classified forms by: (1) process step (nine categories: Simulation, Contouring, Treatment Planning, Patient Setup, Imaging, Treatment Delivery, Physics Check, Physician Review, Therapy Check), (2) severity (Low/Medium/High), and (3) dosimetric impact (Low/Medium/High). Inter-rater reliability was assessed using Fleiss' Kappa for nominal data and weighted Fleiss' Kappa with quadratic weights for ordinal data. LLM-produced classifications were evaluated against each expert via Cohen's Kappa, comparing the mean LLM-expert agreement to mean expert-expert agreement, and overall accuracy against majority expert consensus.

Results

Expert reviewer reliability was as follows: process step Fleiss' κ = 0.28 (fair), severity weighted κ = 0.34 (fair), dosimetric impact weighted κ = 0.51 (moderate). Agreement of LLM-produced classifications and inter-expert classifications was comparable: process step mean LLM-expert κ = 0.40 vs. mean expert-expert κ = 0.30; severity mean LLM-expert κ = 0.33 vs. mean expert-expert κ = 0.37; dosimetric impact mean LLM-expert κ = 0.34 vs. mean expert-expert κ = 0.52. GPT-oss120b was the best-performing model, based on weighted kappa for expert majority vote across process step, severity, and dosimetric impact (κ=0.403,0.448,0.435), respectively.

Conclusion

LLMs demonstrated moderate agreement with the expert reviewers across multiple clinically relevant dimensions, with performance within the range of inter-expert variability. These findings suggest that LLM-assisted triage is a viable approach for scaling ILS forms review and providing consistent initial screening in radiation oncology quality improvement programs, pending validation on larger datasets.

People

Related

Similar sessions

Poster Poster Program
Jul 19 · 07:00
Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD
Therapy Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
User Expectations and Current Availability of HDR Brachytherapy Audits In Europe

The aim of this work was to evaluate the need to implement more dosimetric audits in high‐dose‐rate brachytherapy (HDR-BT) in Europe and to identify which characteristics such audits should meet according to users.

Javier Vijande, PhD Laura Oliver Cañamás
Therapy Physics 0 people interested