Poster Poster Program Diagnostic and Interventional Radiology Physics

Multi-Modal Self-Supervised Contrastive Learning for a Foundational Model for Brachytherapy

Abstract

Purpose

This project uses a multi-modal method to train a foundational model for medical images of cervical cancer patients. Multiple imaging modalities are used over the course of treatment: multiparametric MRI, CT, cone-beam CT, and PET when indicated. This motivates a paired approach for deep learning to leverage the shared information across modalities and aim for image features relevant to downstream clinical tasks.

Methods

Contrastive learning (CL) is a method of self-supervised learning (requiring no labeled data) that pushes together ‘positive’ pairs and separates ‘negative’ pairs in the model’s deep features. This project utilizes multi-modal CL, using scans of different modalities for the same patient as positive pairs. CL typically treats all negative pairs equally, but different patients may have similar anatomy and disease progression. To address this, a soft contrastive loss was developed using clinical information to create a similarity metric for negative pairs. As a preliminary experiment, a vision transformer (ViT) pre-trained on 3D medical images was fine-tuned on an in-house MRI and CT dataset.

Results

The training progress of the ViT was evaluated using attention rollout to visualize the attention flow for a given input, highlighting regions most crucial for extracting deep features. Initial epochs show an emphasis on structures with high voxel intensity, while subsequent iterations resulted in a more balanced attention map incorporating the full pelvic region. At 50 epochs, the attention began leaking into the image background, which is indicative of overfitting.

Conclusion

This work shows potential in using multi-modal CL with clinical information to train foundational ViTs. The attention maps demonstrated that relevant anatomical regions were learned by the model in extracting deep features, but overfitting would eventually begin. The model will be evaluated on downstream tasks to examine if these learned features can result in performance improvements.

People

Ryan YanPresenting Author · University of Toronto Alexandra Rink, PhDAuthors · Medical Physics Department, Princess Margaret Cancer Center, University Health Network Benjamin Haibe-Kains, PhDAuthors · University Health Network Ruiyan NiAuthors · Department of Medical Biophysics, University of Toronto Tony Tadic, PhDAuthors · Princess Margaret Cancer Centre

Similar sessions

Poster Poster Program

Jul 19 · 07:00

B-Trac – Breast Tissue Rotation and Compression Apparatus for Calibration

Mammography (compressed 2D) and MRI (uncompressed 3D) capture breast tissue under different conditions, complicating tumor localization across modalities. To bridge this gap, we developed a customizable physical platform to simul...

Dayadna Hernandez Perez