Enabling Patient-Specific and Population-Level Insights In Radiation Oncology Via Large Language Model Agents
Abstract
Purpose
To develop an AI agent framework leveraging large language models (LLMs) for intelligent data extraction and reasoning over radiation oncology data from oncology information systems (OIS) and electronic medical records (EMR), enabling patient-specific queries and population-level cohort identification for clinical trial matching and treatment eligibility assessment.
Methods
We designed a modular AI agent architecture with three components: (1) Skills—structured knowledge definitions containing database schemas and clinical logic for OIS (Mosaiq, Aria) and EMR (Epic Clarity); (2) Model Context Protocol (MCP)—standardized function calls for secure database connectivity; (3) LLM-based coding agent for autonomous SQL generation and multi-step reasoning. The agent interprets natural language queries, auto-generates SQL based on loaded skills, executes queries against relational databases or per-patient vector databases for semantic search, and synthesizes results through clinical reasoning. Treatment criteria (e.g., PULSAR protocols) and trial requirements are encoded as reasoning skills. Results are presented via an interactive web interface.
Results
The agent achieved ~94% SQL generation accuracy and ~91% patient eligibility classification accuracy against manual chart review. PULSAR candidate identification averaged 2.3 minutes per query (1,037-patient database) versus 4-6 hours manually. Clinical trial matching time reduced by ~85% (from ~45 to ~7 minutes per protocol). Vector database semantic search achieved 89% recall. Projected annual savings for a mid-size department screening 50 patients/month: ~320-400 physician hours.
Conclusion
LLM-based coding agents with structured clinical skills offer a promising approach for intelligent data retrieval in radiation oncology, addressing the challenge of extracting actionable insights from complex multi-source clinical systems and potentially accelerating clinical trial enrollment and personalized treatment selection.