Text-Conditioned Unified Deep Learning Model for Protocol-Aware Radiotherapy Treatment Planning
Abstract
Purpose
To develop a 3D unified deep learning model for predicting dose distribution of various sites and protocols by conditioning the model using text-embedding representation for each protocol.
Methods
We collected a total of 2,296 planning CT scans from Princess Margaret Cancer Centre (Toronto, Canada) paired with PTV contours and dose distribution across six treatment sites with 15 distinct protocols: prostate, left and right breast, lung, head and neck, and brain. We split the datasets into 1,396, 200, and 700 patients for train, validation, and test datasets respectively. Moreover, we evaluated our model using eight different protocols which are unseen during training. We trained a unified dose prediction model using a Residual U-Net 3D as a backbone, which gets 3D volume of CT and PTV as inputs and dose distribution as an output. To condition the model for each protocol, we used CLIP-derived text embeddings integrated via feature-wise linear modulation, enabling protocol-specific dose prediction.
Results
We calculated the mean absolute difference of dose volume histogram metrics (MAE-DVH) normalized by the prescribed dose for each protocol. For the baseline, we evaluated a non-conditioned unified model which showed the MAE-DVH of 0.0644 ± 0.09 for the seen protocols, while showed 0.1433 ± 0.17 for the unseen protocols. However, our conditioned unified model showed the MAE-DVH of 0.0533 ± 0.07 for the seen protocols and 0.1303 ± 0.16 for the unseen protocols, outperforming the non-conditioned model by 20.83% and 9.98% respectively.
Conclusion
This study demonstrates the feasibility of a unified dose prediction model, where text-based conditioning enables adaptation across multiple protocols. These findings highlight the potential of the proposed text-conditioned unified model to support scalable, protocol-adaptive dose prediction in clinical radiotherapy workflows, reducing the need for protocol-specific models while maintaining robust performance across diverse treatment sites.