AI-Assisted Pancreatic Target Delineation on CT: Multicenter Validation and Contouring QA
Abstract
Purpose
To develop and validate an AI system that supports radiotherapy-relevant pancreatic target delineation by localizing and segmenting small pancreatic ductal adenocarcinoma (PDAC) and related anatomy on routine contrast-enhanced CT, and to benchmark performance against expert readers.
Methods
We developed ePAI, a three-stage AI pipeline for CT-based pancreas and lesion delineation. Stage 1 uses nnU-Net to segment pancreas subregions (head/body/tail), adjacent vasculature, and dilated ducts to provide an anatomic context for target definition. Stage 2 fine-tunes the same architecture with real and synthetic lesions plus a large normal cohort to generate PDAC candidate regions while suppressing false positives. Stage 3 applies a transformer classifier that jointly analyzes candidate appearance, location, and pancreatic morphology to label each candidate as PDAC, non-PDAC, or normal. The system outputs voxel-wise masks for the lesion and relevant structures to enable contour review and quality assurance. Training used 1,598 patients (arterial and portal venous phases) from one institution. Evaluation included an internal cohort of 1,012 patients and an external multicenter cohort of 43,038 patients from 140 centers. A reader comparison was performed using the same scans interpreted by 40 board-certified radiologists, including cases with small PDAC (<2 cm) and prediagnostic scans previously missed by radiologists.
Results
For early-stage PDAC (<2 cm), ePAI achieved AUC 0.975 (95% CI: 0.941–1.000; n=345) internally and 0.941 (95% CI: 0.922–0.958; n=3,421) externally, detecting lesions as small as 2–5 mm. Compared with radiologists, ePAI improved sensitivity by 34.1% and specificity by 6.5%. In 75/159 radiologist-missed cases, ePAI identified PDAC with a median lead time of 424 days before clinical diagnosis.
Conclusion
ePAI provides CT-based, interpretable lesion-and-anatomy masks that can support pancreatic target delineation and contour QA across multicenter data, with strong generalizability and earlier identification of subtle disease.