Geometric Comparison of Auto-Segmentation Accuracy of Web-Based AI Model for Efficient Pelvic Radiotherapy
Abstract
Purpose
This study evaluates a commercial web-based AI model for auto-segmentation of pelvic sites by comparing its contours to expert ground truth and an alternative AI model, emphasizing its accuracy and clinical relevance for radiotherapy planning.
Methods
The ART-Plan™ (version v2.3.1, TheraPanacea Inc., France) Web-based software implemented in Brainlab Elements (version v4.0, Brainlab AG, Germany) was used for auto-segmentation validation. A retrospective study was conducted on simulation CT of radiotherapy (RT) structure datasets from 10 pelvic cancer plans. The important organs at risk (OARs), including the bladder, bowel bag, rectum, and femoral heads, and prostate were considered. The auto‑segmented contours were analyzed and compared with the MD‑approved (ground truth) contours, and the auto‑segmented contours developed by MIM Contour ProtégéAI v1.3.1 (MIM Software, OH). The segmentation performance was quantified using standard geometric metrics such as Dice Similarity Coefficients (DSC), Mean Distance to Agreement (MDA), and Hausdorff distance (HD).
Results
TP ART-Plan took an average of 3-5 minutes, while ProtégéAI took 5-8 minutes. Robust auto-segmentation was observed for the bladder, and femoral heads, with a mean DSC >0.8 and MDA <3 mm, compared with ground-truth contours. The mean DSC and MDA values for the bowel (DSC: 0.44±0.09, MDA: 33.29±10.08) and rectum (DSC: 0.74±0.14, MDA: 4.35±3.35). The substantial differences in the bowel and rectum were mainly due to interobserver and institutional margin variability for these structures, with image quality also likely playing a role. Auto-segmentation performance for ART-Plan vs ProtégéAI was comparable across pelvic structures, with TP ART-Plan outperforming ProtégéAI for the rectum (p-value<0.05), and protégéAI slightly outperforming for the bowel region(p-value<0.07).
Conclusion
The AI-based TP ART-Plan and MIM ProtégéAI generated pelvic contours were evaluated against the physician-approved ground truth. Geometric evaluation of the auto-segmented structures against ground truth contours revealed clinically relevant differences in fidelity which may impact treatment outcomes.