Poster Poster Program Therapy Physics

Novel LLM-Based Cardiotoxicity Extraction from Electronic Health Records

Abstract
Purpose

Cardiotoxicity assessment following lung cancer radiotherapy is limited by the difficulty of extracting outcomes from unstructured Electronic Health Records (EHRs). Large Language Models (LLMs) offer a scalable solution, with privacy constraints necessitating locally hosted, open-source deployments. This work evaluated whether integrating structured screening with optimized prompting enables accurate cardiotoxicity surveillance using locally hosted open-source LLMs. We compared strategies prioritizing evidence-based interpretability against those optimized for operational robustness.

Methods

Epic EHR data from 92 lung cancer patients were analyzed, including structured problem lists and unstructured physician notes. Six open-source LLMs were deployed locally via Ollama. A structured problem list screen provided initial cardiotoxicity triage, remaining patients underwent specialty- and cardiac keyword–based note prefiltering to reduce input length. Two prompting strategies were compared. Algorithm A used explicit CoT prompting with structured evidence extraction, exposing step-by-step reasoning to support interpretability. Algorithm B applied context-specific prompts with implicit CoT reasoning, performing internal reasoning without intermediate outputs and generating binary classifications to enhance robustness. Performance was evaluated using F1 score and parsing success rate.

Results

Of 92 patients, 45 had physician-confirmed cardiotoxicity. Problem list screening identified 22 cases (23.9%), with remaining patients requiring note-based classification. Under Algorithm A, GPT-OSS achieved the highest performance (F1 = 0.916) with a parsing success rate of 97.8%, followed by DeepSeek-R1 (F1 = 0.828; parsing success = 96.7%). Across all six models, parsing reliability ranged from 93.5–97.8%. Algorithm B demonstrated slightly lower peak performance (GPT-OSS F1 = 0.889) but superior robustness, achieving 100% parsing success in five models. Larger general-purpose LLMs consistently outperformed domain-specific medical models (F1 < 0.75).

Conclusion

Explicit CoT prompting improves interpretability but reduces robustness, while implicit CoT prompting preserves performance and parsing reliability. Open-source LLMs enable reasonable cardiotoxicity extraction, with clinical deployability driven more by prompt architecture and output constraints than by model specialization alone.

People

Related

Similar sessions

Poster Poster Program
Jul 19 · 07:00
Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD
Therapy Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
User Expectations and Current Availability of HDR Brachytherapy Audits In Europe

The aim of this work was to evaluate the need to implement more dosimetric audits in high‐dose‐rate brachytherapy (HDR-BT) in Europe and to identify which characteristics such audits should meet according to users.

Javier Vijande, PhD Laura Oliver Cañamás
Therapy Physics 0 people interested