Quantifying Differences between Clinical Practices Using Natural Language Processing for AI Model Generalization In Radiation Oncology
Abstract
Purpose
Many Artificial intelligence (AI) applications have been developed recently to improve the quality and efficiency of radiotherapy processes. Despite the success of image-based applications such as autocontouring, AI models trained with alphanumeric clinical data suffer from the lack of generalizability due to differences in clinical practices. In this multi-institutional study, we aim to quantify differences between clinical data from different radiation oncology (RO) practices using natural language processing and create a quantity that can inform the potential usability of an AI model in a clinic without going through the complete validation process.
Methods
Anonymized RO datasets including tumor locations, prescription, treatment planning and setup parameters are extracted from three US institutions and one European institution. Each clinical dataset was preprocessed locally into tokenized corpora and converted into multidimensional vectors using Word2Vec model with continuous bag-of-words methodology. The generated word vectors are then shared, and weighted cosine similarity (Csim) between the vectors were calculated to quantify differences in prescription, plan complexity, and treatment approaches of various anatomic tumor locations across different institutions.
Results
We computed Csims for prescription patterns and plan complexity of various tumor sites between different institutions. We observed that for similar practices, Csims would range between 0.7-0.9, and for more diverging practices, Csims would be lower. For example, for gastrointestinal prescription patterns, the Csims between US institutions would range from 0.85-0.87, while the Csims between the European institution and US institutions are ranging from 0.23 to 0.30, indicating an observable difference between the datasets.
Conclusion
This study shows that Csim can capture differences between clinical datasets caused by differences in clinical practices. With the outlined vector space modeling framework, AI researchers can create suitable corpora for comparison and correlate Csim with model performance for validation and QA purposes.