Paper Proffered Program Diagnostic and Interventional Radiology Physics

Anatomic Localization of Fluoroscopic Images: A Deep Learning Approach Using Swin Transformer

Abstract

Purpose

Efforts to improve fluoroscopically-guided interventions (FGIs) and outcomes are partially hindered by the intrinsic variability of these procedures. Anatomic localization of fluoroscopic images can begin to address these needs, but manual image labeling is time-consuming and impractical when considering complex FGIs may involve thousands of acquired images. This study aims to develop and validate a computer vision model capable of automatically classifying low-dose fluoroscopic images into distinct anatomic grid locations to facilitate further analysis.

Methods

A dataset of de-identified clinical fluoroscopy images was curated and mapped to 29 grid locations organized by anatomy and clinical relevance in procedures. A Swin Transformer (Swin-Tiny) architecture, pre-trained on ImageNet, was employed to leverage its hierarchical vision processing capabilities. A random rotation augmentation strategy (+/- 20 degrees) was implemented during training to address the variable orientation of patients and x-ray source/detector. The model was trained with 1477 images using a cosine annealing learning rate schedule on an NVIDIA A100 GPU. Performance was evaluated on an isolated validation set of 453 images not used for training through accuracy, precision, and recall metrics across all anatomic grids.

Results

The model achieved an overall validation accuracy of 95.6%, significantly outperforming initial baseline performance (92%). The confusion matrix analysis revealed high robustness in grid locations with sufficient sample size in the training data – grids with the highest relative contribution achieved near-perfect precision (>0.97). Misclassifications were primarily associated with central/lateral grid boundaries or rare grid locations lacking sufficient training samples.

Conclusion

We successfully employed a fine-tuned Swin Transformer to classify anatomic locations of fluoroscopic images with accuracies sufficient for clinical applications. This tool can be used to automate metadata enrichment in large-scale clinical repositories, paving the way for detailed analysis of FGIs such as automatic patient dose calculations and procedure segmentation into specific task workflows and timelines.

People

James R Duncan, MD/PhDAuthors · Mallinckrodt Institute of Radiology, Washington University School of Medicine Allan Thomas, PhDCorrespondings · Mallinckrodt Institute of Radiology, Washington University School of Medicine Juntian Xu, BSPresenting Author · Institute for Informatics, Data Science & Biostatistics, Washington University School of Medicine Yiren Wang, BSAuthors · Institute for Informatics, Data Science & Biostatistics, Washington University School of Medicine