CLIP-Unet for Photon-Counting CT Iodine Map Generation with Text-Conditioned Protocol Adaptation
Abstract
Purpose
Commercial iodine map generation on photon-counting CT (PCCT) systems is currently only available for high-kV protocols, resulting in higher imaging dose when iodine maps are required, which is undesirable for pediatric imaging. This study proposes a multi-modal framework that generates quantitative iodine maps from low-kV scans by conditioning image translation on both imaging data and acquisition-specific metadata.
Methods
A CLIP-UNet architecture was developed to generate iodine maps conditioned on both image and textual inputs. The model accepts three inputs: low-energy level and high-energy level PCCT images acquired from a single scan, and text-based metadata including object width, kV, tube current, and exposure. Textual inputs were encoded using a pre-trained CLIP text encoder and fused with image features within a UNet-based image translation framework, enabling protocol-aware iodine map generation. Five iodine rods (0–15 mgI/cc) were embedded in 10- and 20-cm cylindrical phantoms and scanned on Siemens Alpha PCCT at 70, 90, 120, and 140 kV using dual- and single-source modes and two dose levels. Mean absolute error (MAE) was used as the training loss. A conventional UNet trained with image inputs only was used for comparison.
Results
A four-fold cross-validation strategy was employed, in which data from three tube potentials were used for training and the remaining was used for validation. Model performance was evaluated using MAE and structural similarity index (SSIM). The proposed CLIP-UNet achieved an average SSIM of 0.922 and MAE of 0.107 mgI/cc, outperforming the image-only UNet, which achieved an average SSIM of 0.892 and MAE of 0.125 mgl/cc.
Conclusion
This study demonstrates the feasibility of text-conditioned iodine map generation from PCCT acquisitions using a multi-modal deep learning framework. By integrating imaging data with acquisition metadata, the proposed CLIP-UNet improves robustness and generalizability across patient sizes and imaging protocols, extending iodine mapping capability beyond commercially supported settings.