Poster Poster Program Diagnostic and Interventional Radiology Physics

AI Methods for the Diagnosis of Pneumonia on Chest Radiographs: Evaluation of Clinical Interpretability

Abstract
Purpose

This study investigated performance assessment metrics that, when applied to artificial intelligence (AI) outputs designed to identify the presence of pneumonia on chest radiographs, best align with the subjective clinical opinion of radiologists.

Methods

A subset of 100 chest radiographs was collected from the test set of the Medical Imaging and Data Resource Center (MIDRC) XAI Challenge: Decoding AI Decisions for Pneumonia on Chest Radiographs. Each radiograph had an associated reference probability map generated from manual segmentations of pneumonia by three radiologists. The radiographs were processed through four of the top-performing models from the XAI Challenge (Models A, B, C, and D from best to worst). The model outputs were compared with the reference maps using three different metrics: weighted log-loss (WLL) (the metric used in the XAI Challenge), Dice similarity coefficient (DSC), and weighted Dice similarity coefficient (wDSC). To account for continuous pixel-value ranges within model outputs, pixel values were binned into quartiles to produce weighted outputs for DSC-based calculations. Each model was ranked from best to worst according to each metric separately, and average metric values with 95% confidence intervals (95% CIs) were calculated across all 100 cases. To establish an assessment of clinical utility, five radiologists evaluated the outputs from each model for each case and ranked them based on their perceived clinical utility for pneumonia localization. Metric-based rankings then were compared with the radiologists’ rankings.

Results

Radiologists ranked the models from best to worst as C, B, D, A. The WLL produced the ranking A, B, C, D, and the DSC produced the ranking B, C, A, D. The wDSC produced the ranking C, B, A, D.

Conclusion

The wDSC aligned most closely with the rankings of the radiologists. Future studies will explore alternative weighting schemes for the WLL and wDSC and investigate alternative metrics.

People

Related

Similar sessions

Poster Poster Program
Jul 19 · 07:00
B-Trac – Breast Tissue Rotation and Compression Apparatus for Calibration

Mammography (compressed 2D) and MRI (uncompressed 3D) capture breast tissue under different conditions, complicating tumor localization across modalities. To bridge this gap, we developed a customizable physical platform to simul...

Dayadna Hernandez Perez
Diagnostic and Interventional Radiology Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
Comprehensive Medical Physics Assessment of Digital Mammography Equipment: A Three-Year Multi-Site Evaluation of Technical Performance and Radiation Safety at 24 Saudi Arabian Healthcare Institutions (2022–2024)

To conduct a comprehensive multi-center audit evaluating the technical performance, image quality, and radiation safety of digital mammography systems across 24 unique healthcare facilities in Saudi Arabia. This study aims to est...

Sami Alshaikh, PhD
Diagnostic and Interventional Radiology Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
Starting Small: Implementing a CT Protocol Optimization Program

This talk describes our organization’s CT optimization program, and how we implemented it to make efficient use of limited physicist time.

Robert J. Cropp, PhD
Diagnostic and Interventional Radiology Physics 0 people interested