Physics-Guided Transfer Learning and Explainable Attention U-Net++ for Breast Lesion Detection
Abstract
Purpose
Breast cancer screening reduces breast cancer mortality through early detection, with mammography as the most widely used imaging modality. However, radiologist interpretation is time-consuming, and existing AI tools still suffer from missed subtle lesions and high false positive rates. In this study, we present a physics-guided, explainable deep learning framework with transfer learning, leveraging simulated virtual clinical trial (VCT) data and risk map guidance to improve lesion localization.
Methods
A VCT dataset consisting of 960 Monte Carlo simulated mammograms was generated to pretrain the proposed models for transfer learning. The framework was trained and evaluated on the CBIS-DDSM clinical dataset (1,490 mammograms) using four-fold cross-validation. The proposed physics-guided framework consists of three stages: (1) a U-Net trained to predict physics-guided risk maps that localize suspicious lesion regions; (2) an Attention U-Net++ model pretrained on VCT data and fine-tuned on clinical data, in which the risk maps are incorporated as external attention gating signals to guide feature selection, improve interpretability, and enhance lesion detection and (3) U-Net trained classifier to further reduce false-positive cases.
Results
The risk map generator achieved a tumor-level coverage of 97.2%, providing sufficiently comprehensive guidance for subsequent lesion detection. Adding a classifier improved precision by 7.0%, recall by 26.8%, and F1-score by 18.5% on CBIS-DDSM. The physics-guided risk map further increased precision by 3.0%, recall by 6.1%, and F1-score by 4.0%, achieving a recall of 82.8% and an F1-score of 67.4%. Our model achieved higher recall and F1-score compared to state-of-the-art YOLO-based methods.
Conclusion
The proposed physics-guided explainable framework effectively integrates physics-guided risk maps into the attention modules, which enhances feature selection and improves interpretability. This approach improves detection performance while providing physically meaningful interpretability, supporting clinical decision-making and broader adoption.