Paper Proffered Program Therapy Physics

Inter-Observer Variability Informed Weighted Anisotropic Surface DSC for Segmentation Evaluation

Abstract
Purpose

Common contour evaluation metrics(e.g., DSC or HD) provide global summaries that can miss localized, anisotropic disagreements that drive clinically meaningful edits. Although surface-DSC(SDSC) better reflects boundary discrepancies, it typically uses a subjective isotropic tolerance (e.g., 5 mm) applied uniformly. We assume contour acceptability can be judged by alignment with inter-observer variability(IOV) and introduce an IOV-informed, weighted anisotropic surface DSC(WeightedAniSDSC).

Methods

Using the CURVAS dataset(pancreas, kidneys, liver; three independent annotations per case), cases were split into training(n=20), validation (n=5), and testing(n=65). In training, local IOV was quantified across annotators using DPCR-BLD. Assuming Gaussian distribution, a 2σ surface margin(µ±2σ) was computed at each surface location and mapped to new cases using deformable point-cloud registration (DPCR). AniSDSC replaces the uniform SDSC tolerance with this organ- and region-specific margin; WeightedAniSDSC further applies an outlier-distance penalty to reduce sensitivity to stray voxels and small islands. On validation cases, 60 pseudo-error contours(5 cases × 12 error types) were generated. ROC analysis were conducted with DSC, SDSC5mm, SDSC8mm, SDSC10mm, and WeightedAniSDSC to select an operating threshold. The validation-derived threshold was then applied to the testing subset using all pairwise annotator comparisons.

Results

WeightedAniSDSC improved discrimination of introduced errors (AUC 0.89–0.98) versus DSC (AUC 0.86–0.91) and consistently outperformed SDSC across fixed tolerances. The optimal threshold was less organ-dependent for WeightedAniSDSC (0.85–0.89) than for DSC (0.86–0.91). In testing, WeightedAniSDSC uniquely flagged 20 pancreas, 6 kidney, and 10 liver contours missed by DSC and SDSC5mm. Review identified stray-voxel artifacts and localized boundary deviations beyond expected IOV, and highlighted surface regions exceeding the mapped tolerance.

Conclusion

In this pilot evaluation, incorporating spatially varying, IOV-derived tolerances and outlier weighting into a surface-based agreement metric improved detection of realistic local contour discrepancies compared with conventional global metrics, suggesting WeightedAniSDSC may be a useful complement for segmentation evaluation and warrants broader validation.

People

Related

Similar sessions

Poster Poster Program
Jul 19 · 07:00
Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD
Therapy Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
User Expectations and Current Availability of HDR Brachytherapy Audits In Europe

The aim of this work was to evaluate the need to implement more dosimetric audits in high‐dose‐rate brachytherapy (HDR-BT) in Europe and to identify which characteristics such audits should meet according to users.

Javier Vijande, PhD Laura Oliver Cañamás
Therapy Physics 0 people interested