Generating Physically Plausible 4D Images from Sparse Supervision with Neural ODEs and Respiratory Signals
Abstract
Purpose
To generate physiologically plausible 4D medical image sequences from extremely sparse inputs by integrating external breathing signals, overcoming the limitation of requiring dense supervision in current spatiotemporal interpolation methods.
Methods
A novel Neural ODE-based framework for deformation field generation was proposed. It utilizes only start- and end-phase 2D images as input. A continuous-time network predicts intermediate deformation fields, optimized by a key respiratory curve loss . This loss enforces synchronization between derived motion trajectories and the external breathing signal in amplitude, phase, and rate. It works alongside smoothness and topology-preserving constraints to ensure physically plausible dynamics.
Results
The framework generated high-quality 4D sequences with superior perceptual quality and structural fidelity compared to the state-of-the-art UVINET. Our method achieved a lower (better) LPIPS score (2.203 vs. 6.210) and a higher SSIM (0.976 vs. 0.966), indicating more natural visual results and better anatomical structure preservation. This was achieved while using only endpoint supervision, demonstrating effective learning from sparse data.
Conclusion
This work demonstrates a practical framework for high-quality, physiologically credible 4D image synthesis under sparse supervision. By directly incorporating breathing signals via a novel loss, it ensures realistic motion dynamics. The superior perceptual results and efficient data usage offer significant potential for clinical applications like radiotherapy planning.