Development of a Label-to-Image Latent Diffusion Model for Patient-Derived Digital Twin Generation Using Pelvis Radiotherapy CT Simulation Images.
Abstract
Purpose
Anatomic motion and deformation modeling using digital twins (DTs) hold much promise for radiotherapy applications. However, clinical implementation is limited by availability of high-quality data, computationally intensive and unphysical anatomical simulations. To address these limitations, we propose a modality-agnostic latent diffusion model for synthetic DT image generation conditioned on semantic label maps to preserve geometric consistency. Validated by CT-alike coronal pelvic slices, generated from bone masks, confirming geometric consistency.
Methods
A publicly available dataset of 87 patient-derived CT scans (76/11 split) was processed using a fused window approach. Coronal slices were extracted from the pelvic bone mask-defined region for both CT and mask volumes. Model preprocessing included CT image resampling and renormalization. CT-Mask pairs were used to fine-tune a hybrid Stable Diffusion v1.5-ControlNet model for 16 epochs. ControlNet was initialized from a pre-trained Canny edge model and fully fine-tuned using bone masks to enforce geometric constraints. Cross-attention layers and UNet up-blocks were adjusted to achieve texture fidelity. 44 synthetic images from different slices and positions were generated. Bones were manually segmented without knowledge of the conditioning mask. Geometric consistency was evaluated using the Dice coefficient between ground-truth and segmented bone masks.
Results
The study found a mean Dice coefficient of 0.78±0.06 across 44 generated coronal slices with a maximum value of 0.90. Slice generation took an average of 253.4 seconds employing 35 inference steps.
Conclusion
Our model has proven to preserve bone structure, replicate CT noise patterns and its capability to create slice images faster than BM model approaches. Future work will incorporate a multi-label map to control soft tissue consistency and sCT texture will be refined.