Synthetic CT Generation from Cone-Beam CT for Online Adaptive Radiotherapy Using Flow Matching and Diffusion Schrödinger Models
Abstract
Purpose
Synthetic CT (sCT) generation from cone-beam CT (CBCT) enables accurate online adaptive radiotherapy, to account for daily anatomical and positioning variability in radiotherapy. We introduce two models for CBCT-to-sCT generation: to the best of our knowledge, the first flow-matching-based framework (Flow) and a diffusion Schrödinger Bridge model (DSBM).
Methods
We use 67 paired head-and-neck CT–CBCT scans, split into 52 training, 3 validation, and 12 test cases, ensuring patient-level separation. Images are deformably registered with a mean NCC score of 0.9828±0.0071 across all pairs. The flow-matching model learns a continuous transformation field that maps the CBCT distribution directly to the CT distribution. The DSBM is trained using 250 discrete timesteps to learn a stochastic transport between CBCT to CT distribution. Both approaches employ 3D U-Nets with transformer blocks and are trained on [4, 128, 128] patches to optimize GPU memory usage. Evaluation uses MAE, PSNR, SSIM alongside inference time and spatial error analysis.
Results
The flow-matching model achieves an MAE of 56.74±11.34 HU and a PSNR of 34.18±1.92 within the body mask with a single sampling step. The single-step DSBM achieves an MAE 54.89±9.89 HU and a PSNR of 33.71±1.81, with inference time reduced to 1 minute and 12 seconds per volume. Flow improves efficiency, achieving inference time of 45 seconds per volume, comparable to nnU-Net while maintaining competitive image quality. Error analysis reveals systematic discrepancies attributable to residual misalignment from deformable registration rather than model-induced artifacts.
Conclusion
Flow-matching and DSBM models show strong potential for sCT generation from CBCT by directly transforming the input modality, thereby preserving anatomical structure. Notably, the DSBM achieves optimal performance with a single sampling step, substantially mitigating the typical inference-time limitations of diffusion-based approaches. Some evaluation inaccuracies are driven by deformable registration errors, underscoring the need for improved reference alignment in future studies.