Evaluation of a Deep Learning-Based Auto-Contouring Tool for Brain Metastases and Organs-at-Risk on MRI
Abstract
Purpose
To evaluate the clinical accuracy and utility of a commercial deep learning-based auto-segmentation tool by Siemens for delineating brain metastases (GTV) and organs-at-risk (OAR) on contrast-enhanced MRI.
Methods
A retrospective analysis was conducted using a large clinical dataset of 786 structure sets. For the brain metastasis analysis, a filtered dataset of 436 patients (520 structure set comparisons) containing 1,933 clinical metastases was utilized. Ground truth was defined by clinical manual contours. Performance metrics included sensitivity, specificity, False Positive (FP) rate, Dice Similarity Coefficient (DSC), and Hausdorff Distance (HD95). GTV analysis was stratified by lesion volume (0.1 cc) to assess detection limits. OAR segmentation performance was evaluated against manual contours using volume difference, DSC, and HD95.
Results
For GTV detection, the overall sensitivity was 89.6%. Performance was strongly volume-dependent: sensitivity was 93.3% for lesions >0.1 cc but decreased to 61.5% for lesions <0.1 cc. Specificity followed a similar trend (92.3% vs. 58.5%). The median false positive rate was 1 per case, with 59.4% of cases exhibiting FPs. The median DSC for detected GTVs was 0.79. For OARs, the algorithm demonstrated robust performance (high DSC, low HD95) for the brainstem, eyes, and hippocampus. However, discrepancies were noted in the optic apparatus, where the AI sometimes produced disconnected contours for the optic nerves and chiasm.
Conclusion
The AI auto-contouring tool demonstrates high sensitivity for detecting brain metastases larger than 0.1 cc and provides accurate segmentation for major intracranial OARs, supporting its use to improve efficiency in clinical workflows. Performance metrics are likely an underestimate, especially for lesions <0.1 cc, as the clinical ground truth was not curated for this retrospective analysis. Based on these validation results, the tool is currently being implemented and evaluated as a quality assurance measure in a prospective same-day, MRI-only stereotactic radiosurgery trial.