Weakly Supervised Radiotherapy Segmentation from CT Reports: Reducing Voxel-Wise Labeling for Target Tumors and Organs-at-Risk
Abstract
Purpose
To reduce reliance on labor-intensive voxel-wise tumor masks by training CT segmentation models directly from routine radiology and pathology reports, enabling scalable detection and localization of tumors relevant to radiotherapy planning and incidental findings.
Methods
We propose R-Super, a report-supervised training strategy that converts report-derived tumor attributes into differentiable supervision on segmentation outputs. A zero-shot LLM extracts tumor organ location, count, diameter(s) (converted to volume), attenuation class (hypo/hyper/iso), and malignancy (when pathology is available). These attributes guide four losses: Volume Loss (matches total tumor volume per organ), Ball Loss (localizes tumors using fixed spherical “ball convolutions” consistent with reported size and location), Attenuation Loss (enforces intensity consistency between segmented tumors and reported attenuation), and Pathology Loss (separates benign vs malignant tumor channels when pathology labels exist). Training used 101,654 CT–report pairs (UCSF + Stanford Merlin) emphasizing seven understudied tumor types (spleen, gallbladder, prostate, bladder, uterus, esophagus, adrenal). A subset of 723 CT–mask pairs was created via report-guided active learning to enable mixed supervision and quantitative segmentation evaluation where masks exist.
Results
On internal testing (UCSF, n=1,220), R-Super trained with reports only achieved strong tumor detection across all seven types (e.g., sensitivity 66.7–88.4% with specificity 66.9–92.8%), substantially outperforming public vision–language models and a universal lesion segmentation baseline. Adding a small number of masks improved detection further (average sensitivity 81.7% and specificity 84.4%). In external validation (Stanford/Merlin, n=1,133), R-Super outperformed standard mask-only segmentation and achieved an average sensitivity/specificity of 83%/80%. Report-guided active learning reduced mask creation time from ~30 to ~5 minutes per case.
Conclusion
Report supervision provides a scalable path to train segmentation models for tumors lacking public masks, improves generalization across hospitals, and reduces voxel-wise labeling burden while preserving interpretable localization needed for radiotherapy workflows.