Poster Poster Program Diagnostic and Interventional Radiology Physics

AI Methods for the Diagnosis of Pneumonia on Chest Radiographs: Evaluation of Clinical Interpretability

Abstract

Purpose

This study investigated performance assessment metrics that, when applied to artificial intelligence (AI) outputs designed to identify the presence of pneumonia on chest radiographs, best align with the subjective clinical opinion of radiologists.

Methods

A subset of 100 chest radiographs was collected from the test set of the Medical Imaging and Data Resource Center (MIDRC) XAI Challenge: Decoding AI Decisions for Pneumonia on Chest Radiographs. Each radiograph had an associated reference probability map generated from manual segmentations of pneumonia by three radiologists. The radiographs were processed through four of the top-performing models from the XAI Challenge (Models A, B, C, and D from best to worst). The model outputs were compared with the reference maps using three different metrics: weighted log-loss (WLL) (the metric used in the XAI Challenge), Dice similarity coefficient (DSC), and weighted Dice similarity coefficient (wDSC). To account for continuous pixel-value ranges within model outputs, pixel values were binned into quartiles to produce weighted outputs for DSC-based calculations. Each model was ranked from best to worst according to each metric separately, and average metric values with 95% confidence intervals (95% CIs) were calculated across all 100 cases. To establish an assessment of clinical utility, five radiologists evaluated the outputs from each model for each case and ranked them based on their perceived clinical utility for pneumonia localization. Metric-based rankings then were compared with the radiologists’ rankings.

Results

Radiologists ranked the models from best to worst as C, B, D, A. The WLL produced the ranking A, B, C, D, and the DSC produced the ranking B, C, A, D. The wDSC produced the ranking C, B, A, D.

Conclusion

The wDSC aligned most closely with the rankings of the radiologists. Future studies will explore alternative weighting schemes for the WLL and wDSC and investigate alternative metrics.

People

Christopher L. Valdes, MScPresenting Author · University of Chicago Karen Drukker, PhDAuthors · The University of Chicago Samuel G. Armato, PhDAuthors · The University of Chicago Lubomir Hadjiyski, PhDAuthors · University of Michigan Carol C. Wu, MDAuthors · MD Anderson Cancer Center

Similar sessions

Poster Poster Program

Jul 19 · 07:00

B-Trac – Breast Tissue Rotation and Compression Apparatus for Calibration

Mammography (compressed 2D) and MRI (uncompressed 3D) capture breast tissue under different conditions, complicating tumor localization across modalities. To bridge this gap, we developed a customizable physical platform to simul...

Dayadna Hernandez Perez