Multimodal Large Language Models for Patient-Specific Radiotherapy Objective Constraint Generation
Abstract
Purpose
This work assesses a multimodal Large Language Model (MedGemma) for generating planning objectives in standard fractionation (60Gy/30fx) lung cancer radiotherapy. Unlike static vendors' templates, the proposed system dynamically provides objectives based on patients Electronic Health Records (HER) (comorbidities, re-radiation) and physical constraints. The model fuses EHR and geometric organ structure distance-based features to inform objective generation.
Methods
We implemented a Retrieval-Augmented Generation (RAG) pipeline. Inputs included: (1) Anonymized EHR text. (2) Precomputed minimum boundary-to-boundary and center-to-center distances for all segmented organs in CT/RTSTRUCT. (3) A database of clinical trials (e.g., RTOG 0617). The model generated planning objectives (soft constraints) for four endpoints: Lung V20Gy, Mean Lung Dose, Spinal Cord Max, and Esophagus Mean. Validation used a cohort of 50 patients: 10 real cases with objectives extracted from delivered clinical plans (Ground Truth), and 40 synthetic cases generated by geometric variation of trial protocols. We performed a pooled analysis comparing model outputs to physician ground truth using Spearman’s rank correlation and endpoint-specific Mean Absolute Error (MAE). We applied structure-specific Linear-Quadratic (LQ) modeling for EQD2.
Results
In the ground truth comparison (n=40 endpoints), RAG+MedGemma achieved a pooled Spearman correlation of ρ = 0.98 (p < 0.001). Endpoint-specific MAEs were Lung V20Gy: 1.6 % (pp); Mean Lung: 0.9 Gy; Cord Max: 1.2 Gy; Esophagus Mean: 1.8 Gy. For patients with re-irradiation or comorbidities, the model recommended objectives tightened by a mean of 22% relative to standard guidelines. The safety audit revealed 0% numeric hallucination (non-existent guidelines), with 98% of objectives rated as logically accurate based on the EQD2 summation.
Conclusion
Although tested on a limited cohort, MedGemma demonstrated high agreement with GT on adapting planning objectives based on multimodal data. Integration of spatial attributes and LQ-model logic anchored the model, mitigating hallucinations. Together, these advances yield potential for a risk-adaptive decision-support framework.