Multimodal Respiratory Surrogates-Guided Simultaneous, Global-to-Local Anatomical Deformation Reconstruction
Abstract
Purpose
To propose a multimodal surrogates-guided, multi-task respiratory-modeling framework for simultaneous anatomical motion reconstruction and tumor-tracking.
Methods
Driving by complementary external-and-internal motion information captured by respiration-synchronized in-room surrogates—optical surface images(OSI) and sparse-view X-ray projections, this framework delivers a unified global-and-local anatomical tracking pipeline through interrelated multi-task learning(MTL) architecture. The Key components including: i) hybrid attention mechanisms integrating convolutional block attention modules(CBAM) and Swin-transformers to enhance deformation feature encoding, ii) improved back-projection(iBP) algorithm to accelerate sparse-projection propagation, and iii) task-specific prior-enhancements to provide anatomical characterization and clinical delineation. A multi-center, prospective-and-retrospective database including 432 patients(272 thoracic vs. 160 abdominal) was established for development and validation pipelines.
Results
i) Multi-indicator analyses. We achieved high-fidelity CT reconstruction with RMSE of 1.49/1.52m-1(thoracic/abdominal), PSNR of 30.56/30.45 dB, and SSIM of 0.94/0.93. For tumor-tracking, centroid deviation amplitude(DCAM) were 0.47/0.51mm(thoracic/abdominal), DSC were 0.96/0.94, and HD95 were 6.36/6.34mm. No statistically significant differences (two-side T-test) were observed between each validation-fold and testing results. ii) Full-cycle tracking. The predicted tumor motion strongly correlated with ground truths with Pearson correlation coefficients(SI/AP/LR) of 0.98±0.02(SI), 0.97±0.02(AP), 0.93±0.08(LR) for throracic and 0.98±0.01(SI), 0.96±0.02(AP), 0.90±0.03(LR) for abdominal. iii) Impacts of respiratory surrogate configuration. Single-modal surrogates consistently underperformed the multi-modal configuration across all indicators (P0.05). Improvements were significant in RMSE (P=0.04) and PSNR (P =0.04) when projection number reached five. Various orthogonal-view combinations yield no significant differences. iv) Computational efficiency. The total latency was 159.4 ms—well below the temporal threshold (500ms) recommended by AAPM TG-75.
Conclusion
Extensive, multi-center evaluations demonstrated the framework's superior accuracy in anatomical reconstruction and tumor-tracking, highlighting its conceptual rationality and promising clinical feasibility in respiratory tracking and intra-fractional adaptive therapy.