Large-Scale Vision Model-Enabled Data-Efficient Image Registration for Upright Radiation Therapy
Abstract
Purpose
Upright radiotherapy offers favorable physiological and geometric conditions compared with conventional supine delivery. However, posture-induced anatomical deformation complicates the alignment of supine diagnostic images with upright simulation and treatment images, limiting the effectiveness of standard deformable image registration (DIR) methods. This work presents a data-efficient AI-enabled DIR approach designed to enable robust supine-to-upright alignment without requiring posture-matched training data.
Methods
The proposed DIR framework integrates pre-trained large-scale vision transformer foundation encoders embedded within a correlation-aware, multi-scale architecture. Deep feature representations from moving and fixed CT images are fused using convolution layers, and deformation vector fields (DVFs) are directly predicted via a multilayer perceptron decoder in a single inference step. Due to the limited availability of paired supine-upright datasets, the model was developed in a zero-shot setting without supervision from paired supine-upright images. Training relied exclusively on a large cohort of 1856 supine-supine image pairs, leveraging the relative abundance of supine imaging data in current clinical practice. Performance was evaluated using an independent dataset of supine-upright CT scans from six patients. Registration performance was assessed qualitatively and quantitatively using root-mean-squared-error (RMSE), peak-signal-to-noise-ratio (PSNR), and structural-similarity-index (SSIM).
Results
The proposed method achieved accurate and topology-preserving registrations under substantial posture-induced anatomical changes. Supine diagnostic CT images registered using the proposed DIR workflow demonstrated strong correspondence with upright simulation and treatment images. Target volumes delineated in supine scans were accurately mapped to upright geometry, supporting image-guided radiotherapy. Compared with state-of-the-art DIR methods, the proposed method achieved superior performance, with the lowest RMSE (0.052), highest PSNR (25.70), and highest SSIM (0.806).
Conclusion
Large-scale foundation model enables accurate supine-to-upright DIR without requiring posture-matched supervision. These findings demonstrate the feasibility of a data-efficient, zero-shot, posture-aware DIR strategy and highlight its potential utility for image guidance and adaptive treatment planning in upright radiotherapy.