Poster Poster Program Therapy Physics

Beyond Global Metrics: Unmasking Clinically Meaningful Deviations Using a Novel Local Metric In Regression Testing of Deployed Auto-Segmentation Models

Abstract

Purpose

Deep-learning (DL) auto-segmentation is routinely used clinically, yet domain shift and vendor model upgrades can change behavior in non-intuitive ways. We present a regression-testing framework to commission upgraded commercial auto-contouring models and to reveal spatially localized contour differences that may be obscured by global metrics.

Methods

We compared an initial and an upgraded commercial DL model across Pelvis and Head-and-Neck sites(n=36, OARs=21) against reference contours. Performance was first quantified using standard global metrics: Dice Similarity Coefficient (DSC), Hausdorff Distance (HD95), and Mean Surface Distance (MSD). To unmask localized discrepancies hidden by these aggregates, we applied Deformable Point Cloud Registration–Based Bidirectional Local Distance (DPCR-BLD). Unlike global summaries, this method visualizes spatial relationships to explicitly identify region-specific contouring shifts

Results

Global metrics showed no statistically significant differences for most ROIs. For example, the upgraded model improved mean DSC from 0.77 to 0.80 (p=0.06) and reduced HD95 from 10.43 mm to 7.84 mm for prostate. Global metrics values did not indicate where discrepancies occurred. DPCR-BLD localized prostate differences: the initial model over-contoured posterior prostate adjacent to the rectal wall, whereas the upgraded model under-contoured peripheral regions in contact with the rectum. Despite unchanged brainstem DSC (0.89), DPCR-BLD indicated slight superior-region over-contouring in the upgraded model. For the rectum, global metrics suggested no change, but the initial model failed on an outlier case with a rectal balloon; DPCR-BLD clearly highlighted the localized miss.

Conclusion

Combining conventional global metrics with DPCR-BLD provides actionable, spatially resolved regression testing for upgraded DL auto-segmentation models, helping prevent inappropriate transfer of expectations between versions and supporting continuous clinical monitoring and QA.

People

Jingwei Duan, PhDPresenting Author · The University of Texas MD Anderson Cancer Center Roya BaratiAuthors · The University of Texas MD Anderson Cancer Center Yao ZhaoAuthors · The University of Texas MD Anderson Cancer Center Song Gao, PhDAuthors · MD Anderson Cancer Ctr. Jared D. Ohrt, MSAuthors · The University of Texas MD Anderson Cancer Center Peter Balter, PhDAuthors · The University of Texas MD Anderson Cancer Center Jinzhong Yang, PhDAuthors · The University of Texas MD Anderson Cancer Center Laurence Edward Court, PhDAuthors · Department of Radiation Physics, The University of Texas MD Anderson Cancer Center Libing ZhuAuthors · Mayo Clinic Arizona Quan Chen, PhDAuthors · Mayo Clinic Arizona Yi Rong, PhDAuthors · Mayo Clinic Arizona, Department of Radiation Oncology