Investigating the Potential Ability of AI to Recognize Patient Ethnicity In Mammograms
Abstract
Purpose
To determine if Artificial Intelligence (AI) deep learning models can infer patient ethnicity from screening mammograms, and to identify the image features enabling this inference.
Methods
Approximately 287,000 mammographic studies (2013–2019) from the BC Cancer Screening program were available, with a class-balanced subset of 18,000 patients (White, East/Southeast Asian, South Asian) used for training. EfficientNetB3 (CNN) and DINOv2 (Transformer with frozen backbone and linear probe) models were trained on images at various input sizes. Feature contribution was assessed through ablation studies on 512x512 images, using full images, tissue-only patches, and segmentation masks. Performance was compared to a regression model using volumetric breast density, volume, and fibroglandular volume. External validation used the EMBED dataset. Metrics included accuracy, F1-score, and Area Under the ROC Curve (AUC-ROC).
Results
The baseline model, EfficientNetB3 (512x512), achieved an overall accuracy of 0.71 and a macro AUC of 0.84. Class-specific F1-scores were 0.78 for East/Southeast Asian, 0.61 for South Asian, and 0.72 for White patients, indicating consistent predictive performance across groups, with the strongest signal in the East/Southeast Asian population. A transformer-based model (DINOv2, 512x512) yielded slightly lower overall accuracy (0.66) and F1-scores (East/Southeast Asian: 0.73), confirming that the ethnicity signal is not architecture-specific. Ablation studies showed that segmentation masks alone retained substantial predictive ability for East/Southeast Asian patients (F1: 0.68), while tissue-only patches performed poorly (F1: 0.41). This suggests shape and size are highly informative, though internal texture also contributes. In contrast, a regression model using standard volumetric features achieved lower performance (F1: 0.61), confirming that deep learning models extract richer visual cues beyond conventional anatomical measurements. Performance generalized to external data (Asian F1: 0.67, White F1: 0.76) and remained robust to manufacturer exclusion.
Conclusion
AI models can reliably infer patient ethnicity from mammograms by identifying complex visual patterns