A Multicenter Pancreatic Target Segmentation Dataset for Radiotherapy and Imaging AI Benchmarking
Abstract
Purpose
To provide a large, diverse, and quality-controlled abdominal CT dataset with pancreas- and tumor-centric voxel-wise annotations to support benchmarking and development of AI models for pancreatic target segmentation and anatomy-aware evaluation relevant to radiotherapy and diagnostic imaging.
Methods
PanTS contains 36,390 CT scans from 145 medical centers with expert-validated voxel-wise annotations of pancreatic tumors, pancreas subregions (head/body/tail), and 24 surrounding anatomical structures (e.g., major vessels, ducts, abdominal/thoracic organs, skeletal landmarks), totaling >993,000 labeled structures. The dataset includes metadata (age, sex, contrast phase, spacing, slice thickness, diagnosis). PanTS is split into a public training set (n=9,901, CC BY-NC-SA) and a held-out test set (n=26,489) reserved for third-party evaluation, sourced from centers not seen during training to enable out-of-distribution (OOD) assessment. Tumors were annotated slice-by-slice and reviewed via a multi-reader process; an inter-annotator study on 300 re-annotated scans quantified agreement using Dice similarity coefficient (DSC). Baseline benchmarking used nnU-Net and evaluated (i) scaling of tumor annotations by training on datasets of increasing size and testing on the PanTS OOD test set, and (ii) the effect of adding surrounding structure labels by comparing 2-class versus 28-class training.
Results
PanTS includes 1,077 tumor-positive training scans (10.9%) and 2,829 tumor-positive test scans (10.7%), with substantial distribution shifts in contrast phases and resolution. Median inter-annotator agreement for tumor masks was DSC 86.1% (IQR 19.6%); low-agreement cases (DSC<20%) were flagged for senior review. Scaling tumor annotations improved OOD detection AUC from 0.810 (MSD-Pancreas, n=281) and 0.819 (PANORAMA, n=2,238) to 0.959 (PanTS, n=9,901). Adding 24 surrounding structures improved tumor segmentation by +10.3% DSC (57.4%→67.7%) and +9.7% NSD (56.8%→66.5%) versus 2-class training.
Conclusion
PanTS enables robust, multicenter benchmarking of pancreatic tumor and pancreas subregion segmentation with anatomically rich context and realistic OOD evaluation, supporting development of more generalizable target segmentation models for radiotherapy and imaging AI.