Paper Proffered Program Diagnostic and Interventional Radiology Physics

Can Vlms Read Glioma Mris? Building a Diagnostic Benchmark Using Public Multi-Parametric MRI and EHR Dataset

Abstract
Purpose

The rapid advancement of Vision-Language Models (VLMs) offers transformative potential for automated radiology; however, their ability to interpret 3D brain tumor imaging remains underexplored. This study introduces a rigorous benchmark to evaluate the diagnostic proficiency of both general-purpose and medically specialized VLMs in preoperative glioma assessment. By utilizing the high-fidelity UCSF-PDGM dataset, we aim to quantify the gap between current AI capabilities and the 3D spatial reasoning required for clinical assessment of glioma.

Methods

We utilized 480 multi-parametric MRI volumes and corresponding expert radiology reports as ground truth. An automated NLP pipeline decoupled these reports into modality-specific observations for T1-weighted Contrast-Enhanced (T1CE) and Fluid-Attenuated Inversion Recovery (FLAIR) sequences, generating 7,657 multiple-choice question (MCQ) pairs. To bridge the dimensionality gap, 3D volumes (1mm isotropic) were uniformly sampled into 2D axial grids (four 6x6 grids per volume). We evaluated GPT-5, Qwen3-VL-30B, and MedGemma (27B/4B) using Zero-Shot Chain-of-Thought (CoT) prompting. Each question was paired with its corresponding single-modality images to isolate perceptual accuracy for specific radiological features.

Results

GPT-5 achieved the highest overall accuracy (67.04%), followed by MedGemma-27B (61.28%), Qwen3-VL-30B (60.05%), and MedGemma-4B (52.35%). While these results indicate that VLMs can identify macroscopic features, performance varied significantly across question categories. Notably, Lesion Characteristics, the most clinically critical category, proved most challenging for all models. Despite uniform 1mm sampling, all models struggled to synthesize cross-slice features, failing to characterize internal tumor morphology and border irregularities readily apparent to radiologists.

Conclusion

This benchmark reveals a critical dimensionality gap: current VLMs lack both 3D spatial reasoning in brain MRI and sufficient brain tumor pre-training to synthesize complex glioma phenotypes. Expert-level glioma interpretation will require native 3D architectures and large-scale VLM pre-training with brain cancer imaging and reports.

People

Related

Similar sessions

Poster Poster Program
Jul 19 · 07:00
B-Trac – Breast Tissue Rotation and Compression Apparatus for Calibration

Mammography (compressed 2D) and MRI (uncompressed 3D) capture breast tissue under different conditions, complicating tumor localization across modalities. To bridge this gap, we developed a customizable physical platform to simul...

Dayadna Hernandez Perez
Diagnostic and Interventional Radiology Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
Comprehensive Medical Physics Assessment of Digital Mammography Equipment: A Three-Year Multi-Site Evaluation of Technical Performance and Radiation Safety at 24 Saudi Arabian Healthcare Institutions (2022–2024)

To conduct a comprehensive multi-center audit evaluating the technical performance, image quality, and radiation safety of digital mammography systems across 24 unique healthcare facilities in Saudi Arabia. This study aims to est...

Sami Alshaikh, PhD
Diagnostic and Interventional Radiology Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
Starting Small: Implementing a CT Protocol Optimization Program

This talk describes our organization’s CT optimization program, and how we implemented it to make efficient use of limited physicist time.

Robert J. Cropp, PhD
Diagnostic and Interventional Radiology Physics 0 people interested