A Conditional Wavelet Diffusion Framework with Depthwise Convolutions for 3D Medical Image Synthesis
Abstract
Purpose
To introduce a conditional Wavelet Diffusion framework incorporating depth-wise convolution for efficient high-resolution 3D medical image synthesis, and to evaluate its performance on MRI cross-modality translation and CT texture generation for realistic digital phantoms.
Methods
We formulated MRI modality translation and CT phantom texture synthesis as paired conditional image-to-image translation and modeled the conditional distribution using a conditional diffusion framework. Inputs were represented in the wavelet domain using a 3D Haar discrete wavelet transform (DWT). To enable efficient learning on high-resolution 3D volumes while preserving strong representational capacity, we employed a ConvNeXt-style depthwise separable 3D denoising architecture, combining large-kernel depthwise convolutions, pointwise feature mixing, and timestep-dependent modulation. MRI experiments used the BraTS 2024 dataset (1,350 training and 188 validation subjects) with four co-registered MRI modalities at 1 mm³ resolution. For CT synthesis, the model was trained on 317 clinically collected volumetric cases and evaluated on 49 held-out test cases, with all volumes resized to 192³ and normalized to [−1, 1]. Performance was assessed using quantitative metrics and qualitative visual inspection.
Results
Qualitative evaluation demonstrated anatomically consistent 3D synthesis, preserving multi-planar structural continuity in MRI modality translation and realistic intra-organ textures with accurate boundaries in CT phantom synthesis. Quantitative results in test cases showed high structural fidelity on MRI BraTS (SSIM = 0.94 ± 0.04) and a masked PSNR of 25.7 dB with a soft-tissue MAE of 58 HU in the CT synthesis dataset.
Conclusion
This work establishes a generalized conditional wavelet diffusion framework for high-resolution 3D medical image synthesis across CT and MRI applications. The incorporation of depthwise separable convolution enables efficient diffusion modeling while maintaining expressive capacity for large 3D volumes. The unified formulation supports both intra- and cross-modality image synthesis for applications in both imaging and radiotherapy.