Paper Proffered Program Therapy Physics

Weakly Supervised Radiotherapy Segmentation from CT Reports: Reducing Voxel-Wise Labeling for Target Tumors and Organs-at-Risk

Abstract

Purpose

To reduce reliance on labor-intensive voxel-wise tumor masks by training CT segmentation models directly from routine radiology and pathology reports, enabling scalable detection and localization of tumors relevant to radiotherapy planning and incidental findings.

Methods

We propose R-Super, a report-supervised training strategy that converts report-derived tumor attributes into differentiable supervision on segmentation outputs. A zero-shot LLM extracts tumor organ location, count, diameter(s) (converted to volume), attenuation class (hypo/hyper/iso), and malignancy (when pathology is available). These attributes guide four losses: Volume Loss (matches total tumor volume per organ), Ball Loss (localizes tumors using fixed spherical “ball convolutions” consistent with reported size and location), Attenuation Loss (enforces intensity consistency between segmented tumors and reported attenuation), and Pathology Loss (separates benign vs malignant tumor channels when pathology labels exist). Training used 101,654 CT–report pairs (UCSF + Stanford Merlin) emphasizing seven understudied tumor types (spleen, gallbladder, prostate, bladder, uterus, esophagus, adrenal). A subset of 723 CT–mask pairs was created via report-guided active learning to enable mixed supervision and quantitative segmentation evaluation where masks exist.

Results

On internal testing (UCSF, n=1,220), R-Super trained with reports only achieved strong tumor detection across all seven types (e.g., sensitivity 66.7–88.4% with specificity 66.9–92.8%), substantially outperforming public vision–language models and a universal lesion segmentation baseline. Adding a small number of masks improved detection further (average sensitivity 81.7% and specificity 84.4%). In external validation (Stanford/Merlin, n=1,133), R-Super outperformed standard mask-only segmentation and achieved an average sensitivity/specificity of 83%/80%. Report-guided active learning reduced mask creation time from ~30 to ~5 minutes per case.

Conclusion

Report supervision provides a scalable path to train segmentation models for tumors lacking public masks, improves generalization across hospitals, and reduces voxel-wise labeling burden while preserving interpretable localization needed for radiotherapy workflows.

People

Pedro Bassi, PhDPresenting Author · Johns Hopkins University Wenxuan LiAuthors · Johns Hopkins University Zongwei Zhou, PhDCorrespondings · Johns Hopkins University Alan Yuille, PhDAuthors · Johns Hopkins University Kang Wang, MD, PhDAuthors · University of California, San Francisco Xinze ZhouAuthors · Johns Hopkins University Kai Ding, PhDAuthors · Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University Heng Li, PhDAuthors · Johns Hopkins University Yang Yang, PhDAuthors · Department of Radiology, University of California, San Francisco