Toxicity Prediction In Prostate Radiotherapy Using Multimodal Features with 3D Mednext and Large Language Models
Abstract
Purpose
Accurate prediction of radiotherapy toxicity requires integrating heterogeneous data, including 3D dose distributions, patient anatomy, and unstructured clinical text. We developed a multimodal pipeline that couples deep learning–based dose prediction (3D MedNeXt) with a local large language model (LLM) to predict toxicity with interpretable rationales.
Methods
A cohort of 117 prostate cancer patients (Tox, n=45; Non-Tox, n=72) was retrospectively analyzed. We defined clinically significant toxicity as ToxicityGradeCode ≥ 2 (Tox), and grades 0–1 as Non-Tox, yielding a binary endpoint for model development. A 3D MedNeXt model was trained to predict dose distributions from planning CT and RT structures. Predicted doses were converted into interpretable textual descriptors, including DVH metrics and spatial hotspot features (high-dose proximity to the rectal wall). Clinical modifiers (SpaceOAR and physician notes) and anatomical features (bladder volume and PTV–rectum overlap) were extracted through automated parsing of prescription PDFs. These multimodal features were synthesized into standardized patient reports. A local LLM (Llama 3.1) processed the reports to predict binary toxicity (Tox vs. Non-Tox) and generate a concise clinical rationale. Model performance was evaluated on a held-out test set (N=20; toxicity prevalence=0.40) and benchmarked against a text-only baseline using ROC and precision–recall analyses.
Results
The multimodal model achieved ROC-AUC=0.844 and PR-AUC (AP)=0.827, outperforming the text-only baseline (AUC=0.771, AP=0.745). At the selected operating point, accuracy increased from 0.55 to 0.65, F1 score from 0.53 to 0.59, and precision from 0.45 to 0.56, while recall was maintained at 0.62. Calibration also improved, with Brier score decreasing from 0.198 to 0.163. Model rationales frequently highlighted dose hotspots in conjunction with protective modifiers, supporting transparent risk interpretation.
Conclusion
Integrating MedNeXt-predicted 3D dose with automatically mined clinical and anatomical context enables an explainable LLM-based toxicity classifier. Ongoing work will expand the cohort and extend the framework to adaptive radiotherapy workflows, including particle therapy.