Report-Conditioned Synthetic Abdominal Tumors In CT Using Large Language Models
Abstract
Purpose
To determine whether tumor descriptions in routine radiology reports can be converted into controllable priors for synthetic tumor generation in CT, and whether these report-conditioned synthetic tumors improve robustness of tumor detection and segmentation relevant to radiotherapy-oriented imaging workflows.
Methods
We developed TextoMorph, a report-conditioned tumor synthesis pipeline. A large language model extracts structured tumor descriptors from radiology reports (e.g., organ location, size/shape cues, attenuation class, and margin sharpness). These descriptors, together with the input CT and a tumor mask, condition a denoising diffusion probabilistic model (DDPM) that generates diverse tumors while preserving local anatomy. At inference, TextoMorph synthesizes tumors with controlled appearance (hypodense/hyperdense, well-defined/ill-defined margins) to augment training data for organ-specific detection and segmentation. To quantify diversity, we performed a radiomics-based heterogeneity analysis by extracting 102 texture features from synthetic tumors and measuring pairwise cosine similarity distributions. Realism was assessed using a radiologist Visual Turing Test. Augmented models were evaluated on internal and external cohorts for liver, pancreas, and kidney tumor detection, classification, and segmentation.
Results
Synthetic augmentation improved early tumor detection sensitivity by +8.5% (liver), +4.2% (pancreas), and +3.9% (kidney). Segmentation improved by +6.3% Dice (liver) and +9.0% Dice (pancreas). In the Visual Turing Test, radiologists mislabeled synthetic tumors as real in up to 45.0% of cases, supporting visual realism. Radiomics analysis showed increased heterogeneity (e.g., mean variance 1.14 for liver tumors). Report-conditioned synthesis improved classification sensitivity from 61.9%→79.0% (malignant) and 50.7%→70.8% (cystic). For large kidney tumors, targeted synthesis increased sensitivity by +9.1% and Dice by +4.7%.
Conclusion
Radiology report text can be translated into controllable, quantitative priors for CT tumor synthesis. Report-conditioned synthetic tumors increase appearance diversity and improve detection, segmentation, and classification performance, supporting more robust training and stress-testing of abdominal tumor models for clinical imaging applications.