Poster Poster Program Therapy Physics

LLM-Based Clinical Note Summarization Feasibility Study

Abstract
Purpose

Cancer treatment planning requires clinicians to rapidly synthesize complex clinical information from detailed patient notes, a process that is time-consuming and cognitively demanding, particularly in multidisciplinary workflows involving non-physician clinicians. As clinical documentation grows in volume and complexity, this challenge is expected to intensify. Large language model (LLM)–based clinical text summarization offers a promising approach to streamline information extraction. However, multi-agent LLM performance depends strongly on system design choices, including reflective reasoning and prompt configuration, and hallucinations remain unacceptable in safety-critical clinical environments. This feasibility study evaluates whether multi-agent LLMs can practically generate accurate, evidence-grounded clinical summaries for cancer treatment planning, using adaptive radiotherapy (ART) clinical notes as a representative use case.

Methods

Twenty-one ART clinical notes authored by radiation oncologists were collected. Initial LLM-generated summaries were edited by a human reviewer to form the ground-truth dataset. Each summary included diagnosis, prior irradiation, treatment intent, and Eastern Cooperative Oncology Group (ECOG) performance status. A four-agent pipeline was implemented: Agent 1 generated an initial summary; Agent 2 evaluated the summary against the ground truth; Agent 3 revised the summary using reflective reasoning; and Agent 4 evaluated the revised output. Both 0-shot and 3-shot prompting were tested, yielding four configurations per clinical note. Evaluation agents classified each summary sentence into a confusion matrix based on agreement with the ground truth, from which precision, recall, and F1-score were computed.

Results

For 0-shot and 3-shot prompting, mean F1-scores before reflection were 85.3% and 86.5%, respectively. After reflection, mean F1-scores were 85.1% and 86.9%, respectively.

Conclusion

For structured clinical summaries, multi-agent LLM performance improves modestly with reflective reasoning and 3-shot prompting. As a feasibility study, these findings demonstrate the practical potential of LLM-generated summaries for supporting cancer treatment planning and motivate further investigation across broader clinical contexts.

People

Related

Similar sessions

Poster Poster Program
Jul 19 · 07:00
Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD
Therapy Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
User Expectations and Current Availability of HDR Brachytherapy Audits In Europe

The aim of this work was to evaluate the need to implement more dosimetric audits in high‐dose‐rate brachytherapy (HDR-BT) in Europe and to identify which characteristics such audits should meet according to users.

Javier Vijande, PhD Laura Oliver Cañamás
Therapy Physics 0 people interested