Poster Poster Program Therapy Physics

Beyond Global Metrics: Unmasking Clinically Meaningful Deviations Using a Novel Local Metric In Regression Testing of Deployed Auto-Segmentation Models

Abstract
Purpose

Deep-learning (DL) auto-segmentation is routinely used clinically, yet domain shift and vendor model upgrades can change behavior in non-intuitive ways. We present a regression-testing framework to commission upgraded commercial auto-contouring models and to reveal spatially localized contour differences that may be obscured by global metrics.

Methods

We compared an initial and an upgraded commercial DL model across Pelvis and Head-and-Neck sites(n=36, OARs=21) against reference contours. Performance was first quantified using standard global metrics: Dice Similarity Coefficient (DSC), Hausdorff Distance (HD95), and Mean Surface Distance (MSD). To unmask localized discrepancies hidden by these aggregates, we applied Deformable Point Cloud Registration–Based Bidirectional Local Distance (DPCR-BLD). Unlike global summaries, this method visualizes spatial relationships to explicitly identify region-specific contouring shifts

Results

Global metrics showed no statistically significant differences for most ROIs. For example, the upgraded model improved mean DSC from 0.77 to 0.80 (p=0.06) and reduced HD95 from 10.43 mm to 7.84 mm for prostate. Global metrics values did not indicate where discrepancies occurred. DPCR-BLD localized prostate differences: the initial model over-contoured posterior prostate adjacent to the rectal wall, whereas the upgraded model under-contoured peripheral regions in contact with the rectum. Despite unchanged brainstem DSC (0.89), DPCR-BLD indicated slight superior-region over-contouring in the upgraded model. For the rectum, global metrics suggested no change, but the initial model failed on an outlier case with a rectal balloon; DPCR-BLD clearly highlighted the localized miss.

Conclusion

Combining conventional global metrics with DPCR-BLD provides actionable, spatially resolved regression testing for upgraded DL auto-segmentation models, helping prevent inappropriate transfer of expectations between versions and supporting continuous clinical monitoring and QA.

People

Related

Similar sessions

Poster Poster Program
Jul 19 · 07:00
Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD
Therapy Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
User Expectations and Current Availability of HDR Brachytherapy Audits In Europe

The aim of this work was to evaluate the need to implement more dosimetric audits in high‐dose‐rate brachytherapy (HDR-BT) in Europe and to identify which characteristics such audits should meet according to users.

Javier Vijande, PhD Laura Oliver Cañamás
Therapy Physics 0 people interested