Cancerverse: Multicenter CT Segmentation of 16 Cancers for Radiotherapy Targets and Oars
Abstract
Purpose
To provide a large, multicenter, longitudinal CT dataset with voxel-wise tumor annotations across multiple cancer sites to support development, benchmarking, and validation of AI models for radiotherapy target and organ-at-risk (OAR) segmentation under real-world clinical variability.
Methods
We present CancerVerse, an open longitudinal and multimodal CT dataset comprising data from 6,000 patients with multiple visits per patient and all CT scans from these visits. In total, CancerVerse includes >50,000 CT scans covering the chest, abdomen, and pelvis. Expert radiologists annotated voxel-wise tumor masks across 16 organs, spanning a wide range of cancer types and anatomical contexts relevant to radiotherapy planning. To capture realistic deployment conditions, the dataset includes diverse scanner vendors, acquisition protocols, and patient demographics. In addition to imaging, CancerVerse links CT scans with radiology reports, clinical variables, and laboratory results, enabling context-aware analysis. A large cohort of normal patients confirmed to be cancer-free for at least three years is included to support evaluation of false positives in screening-style tasks. Data are released in a standardized format with baseline tasks and documentation.
Results
CancerVerse substantially exceeds existing public resources in scale, anatomical coverage, and longitudinal depth. The inclusion of repeated scans enables evaluation of segmentation consistency over time and under disease progression or treatment effects. The presence of normal controls allows quantitative assessment of specificity and failure modes not observable in tumor-only datasets. Baseline experiments demonstrate that models trained and evaluated on CancerVerse exhibit larger performance variance across organs and sites than on single-center datasets, highlighting the importance of multicenter evaluation for radiotherapy applications.
Conclusion
CancerVerse provides a comprehensive, longitudinal CT resource for multicancer tumor segmentation with voxel-wise labels and rich clinical context. It enables robust benchmarking of radiotherapy target and OAR segmentation models under realistic multicenter conditions and supports more reliable translation of AI into clinical practice.