Paper Proffered Program Diagnostic and Interventional Radiology Physics

Automated Vision-Language Model -Derived Clinical Descriptors Enhances Radiomic Profiling for Robust Breast Malignancy Prediction

Abstract

Purpose

To enhance breast malignancy prediction, this study develops a multimodal framework that integrates automated, Vision-Language Model (VLM)-derived BI-RADS lexicons with quantitative radiomic features.

Methods

This multi-center study included 889 patients from two institutions, partitioned into training (80%) and independent testing (20%) cohorts. A VLM-driven workflow that utilizing Gemini 3 Pro to simulate expert-level observation was developed. Unlike traditional manual annotation, the VLM analyzed standard dual-view (CC/MLO) mammograms according to the BI-RADS 5th Edition guidelines. It automatically generated qualitative descriptors covering calcification morphology (e.g., fine pleomorphic, amorphous), distribution patterns (e.g., linear, segmental), and architectural distortion. These "digitized clinical observations" were integrated with quantitative radiomic features (shape, first-order, and texture matrices) through a multimodal early-fusion strategy. Following LASSO-based feature selection, an ensemble of ten machine learning classifiers (including Random Forest, XGBoost, and SVM) was trained. Performance was quantified via AUC and 95% confidence intervals (CI) with 1,000 bootstrap resampling iterations in the test cohorts.

Results

The VLM-augmented fusion framework demonstrated superior robustness and accuracy compared to unimodal baselines. The Random Forest classifier achieved the highest efficacy with an AUC of 0.865 (95% CI: 0.802–0.920), significantly outperforming both the radiomics-only model (AUC 0.847) and the lexicon-only model (AUC 0.758). This trend was consistent across other ensemble architectures like XGBoost and LightGBM (AUCs > 0.84). The integration of VLM-derived lexicons also elevated the lower bound of the 95% CI from 0.777 (radiomics-only) to 0.802 (fusion).

Conclusion

This study validates the novel application of VLMs as automated clinical observers in medical imaging. By effectively fusing VLM-derived semantic logic with micro-structural radiomics, the proposed pipeline offers accurate decision-support tool for multimodal breast cancer diagnosis.

People

Zhenyu YangCorrespondings · Duke Kunshan University Feiyang DuPresenting Author · Duke Kunshan University Pulin SunAuthors · Duke Kunshan University Chulong ZhangAuthors · Medical Physics Graduate Program, Duke Kunshan University Xiaoyi DaiAuthors · Duke Kunshan University Rihui ZhangAuthors · Duke Kunshan University Fang-Fang Yin, PhDAuthors · Duke Kunshan University

Similar sessions

Poster Poster Program

Jul 19 · 07:00

B-Trac – Breast Tissue Rotation and Compression Apparatus for Calibration

Mammography (compressed 2D) and MRI (uncompressed 3D) capture breast tissue under different conditions, complicating tumor localization across modalities. To bridge this gap, we developed a customizable physical platform to simul...

Dayadna Hernandez Perez