Development of a Machine Learning Model for Multi-Site VMAT Dose Prediction
Abstract
Purpose
To develop a unified machine learning model which can predict the dose distribution for any disease site treated with VMAT. The model uses minimal inputs (CT dataset and structure contours) to rapidly output a clinically acceptable dose distribution deliverable via standard VMAT arcs, without requiring any beam geometry or optimization parameters.
Methods
An automated data exporting and housing infrastructure was developed to download a large dataset of our clinical treatment plans. In a pilot study a subset of 5k treatment plans was used to train a transformer model for preliminary dose prediction. All VMAT treatment plans in our clinical dataset were selected independent of treatment site. Model inputs include 3D CT images encoded into latent space using a pretrained LTX-Video Variational Autoencoder (VAE), 3D binary masks for target volumes and organs-at-risk were similarly encoded. Text embeddings of the treatment site and treatment technique were generated via using a Contrastive Language-Image Pre-training (CLIP) model.
Results
The model was evaluated on a held-out test set of 785 treatment plans across all available treatment sites. The overall MAE was 11.4% with the prostate demonstrating the best performance at 6.8% (n= 193) and the most rare treatment site, extremities, performing the worst at 23% (n= 14).
Conclusion
This model achieves performance competitive with the recent OpenKBP grand challenge models. Additionally, the model makes dose predictions across multiple treatment sites without the need for any beam geometry information. The approach demonstrates that modern vision-language architectures can learn dose distribution patterns directly from treatment planning data, potentially enabling rapid dose estimation for more automated planning workflows, including rapid treatment planning, plan QA, and adaptive replanning.