Latent Accelerated Diffusion-Based Deformation Estimation for Real-Time Volumetric Imaging
Abstract
Purpose
The goal of this study is to accurately estimate 3D deformation and generate real-time volumetric images using only one or two X-ray projections, overcoming the limitations of ultra-sparse conditions in conventional deformable image registration (DIR) and volumetric image reconstruction methods
Methods
We propose a Latent Accelerated Diffusion framework for Deformation Estimation enabling Real-time volumetric imaging (LADDER) framework, which combines (1) deformation network (VoxelMorph) generates a pre-treatment patient-specific baseline deformation vector field (DVF), and (2) a latent diffusion model (LDM)-based DIR estimating real-time low-dimension DVF scaling and residual maps from sparse X-ray projections via cross-attention. The LDM process compresses the baseline DVF into a low-dimensional manifold, enabling computationally efficient refinement conditioned on projection-derived anatomical cues. The model adopts a physics-informed loss to enforce anatomical and projection consistency. The trained LADDER framework was was trained on 500 Learn2Reg dataset and evaluated on 10 DIR-Lab datasets across compression rates, down-sampling rates, projection configurations, and baseline DVF spans.
Results
Under dual-projection input, extreme inhale–exhale baseline DVF and the compression and down-sampling rates of 8 condition, LADDER achieves a mean target registration error (TRE) of mm across 10 testing cases, and high volumetric structure similarity (3D SSIM > 0.95) and low volumetric reconstruction error (3D NMSE < 0.006), while maintaining real-time inference speed on the order of 0.11–0.12s. Further analysis shows the range of DVF span impacts deformation accuracy, and compared to single-projection, the dual-projection input improves deformation fidelity and reduced variability across breathing phases.
Conclusion
LADDER introduces a new paradigm for real-time deformation estimation and volumetric image reconstruction using projection-conditioned latent diffusion of patient-specific DVFs. The demonstrated accuracy and real-time speed position LADDER as a promising solution for next-generation motion management, with potential to reduce margins, improve target localization, and enhance the safety and efficacy of lung SBRT and SBPT.