Paper Proffered Program Diagnostic and Interventional Radiology Physics

A Self-Supervised Transformer Framework for Interpretable Extranodal Extension Detection In Head and Neck Cancer

Abstract

Purpose

Extranodal extension (ECE) of metastatic lymph nodes is a major adverse prognostic factor in head and neck cancer and strongly influences treatment intensity. However, pre-operative identification of ECE on CT is challenging, subjective, and inconsistent across observers. This study aims to develop an interpretable deep-learning framework that can automatically detect ECE from routine CT scans and provide localized visual evidence to support clinical decision-making.

Methods

We curated a dataset of 3D head-and-neck CT scans from 150 patients with pathology-confirmed ECE status. Clinical radiotherapy planning contours (RTSTRUCT) were exported and converted into CT–mask volumes defining nodal regions of interest. We designed a 3D dual-output SwinUNETR architecture that jointly performs voxel-level lymph-node region-of-interest (ROI) segmentation to constrain and interpret case-level ECE classification. To address data scarcity and heterogeneity, we implemented masked-autoencoder (MAE) pretraining on 215 unlabeled CT volumes before fine-tuning on labeled ECE data. Additional design elements included anatomically informed soft-tissue mask dilation to reduce sensitivity to contour variability, class-imbalance–aware optimization, and mask-guided pooling that explicitly links classification decisions to predicted nodal regions. Performance was evaluated using Dice for segmentation and AUC, accuracy, sensitivity, specificity, and F1-score for classification.

Results

The dual-output SwinUNETR achieved a Dice score of 0.808 for nodal ROI segmentation and a classification AUC of 0.681. Incorporating MAE pretraining improved performance to Dice = 0.822 and AUC = 0.886, with accuracy of 0.782, sensitivity of 0.833, specificity of 0.727, and F1-score of 0.800, demonstrating gains in ECE classification while maintaining localization within nodal regions

Conclusion

A self-supervised dual-output SwinUNETR enables accurate and interpretable CT-based detection of ECE under limited labeled data. By providing nodal-region localization alongside case-level ECE risk estimates, this framework has the potential to improve pre-operative staging, reduce unnecessary neck dissections, and support imaging-driven treatment personalization in head-and-neck cancer.

People

William N. Duggar, PhDCorrespondings · University of Mississippi Med. Center Amirhossein Eskorouchi, PhDAuthors · Mississippi State University Haifeng Wang, PhDAuthors · Mississippi State University Li YuanPresenting Author · University of Mississippi Med. Center

Similar sessions

Poster Poster Program

Jul 19 · 07:00

B-Trac – Breast Tissue Rotation and Compression Apparatus for Calibration

Mammography (compressed 2D) and MRI (uncompressed 3D) capture breast tissue under different conditions, complicating tumor localization across modalities. To bridge this gap, we developed a customizable physical platform to simul...

Dayadna Hernandez Perez