Poster Poster Program Therapy Physics

Quantifying Differences between Clinical Practices Using Natural Language Processing for AI Model Generalization In Radiation Oncology

Abstract

Purpose

Many Artificial intelligence (AI) applications have been developed recently to improve the quality and efficiency of radiotherapy processes. Despite the success of image-based applications such as autocontouring, AI models trained with alphanumeric clinical data suffer from the lack of generalizability due to differences in clinical practices. In this multi-institutional study, we aim to quantify differences between clinical data from different radiation oncology (RO) practices using natural language processing and create a quantity that can inform the potential usability of an AI model in a clinic without going through the complete validation process.

Methods

Anonymized RO datasets including tumor locations, prescription, treatment planning and setup parameters are extracted from three US institutions and one European institution. Each clinical dataset was preprocessed locally into tokenized corpora and converted into multidimensional vectors using Word2Vec model with continuous bag-of-words methodology. The generated word vectors are then shared, and weighted cosine similarity (Csim) between the vectors were calculated to quantify differences in prescription, plan complexity, and treatment approaches of various anatomic tumor locations across different institutions.

Results

We computed Csims for prescription patterns and plan complexity of various tumor sites between different institutions. We observed that for similar practices, Csims would range between 0.7-0.9, and for more diverging practices, Csims would be lower. For example, for gastrointestinal prescription patterns, the Csims between US institutions would range from 0.85-0.87, while the Csims between the European institution and US institutions are ranging from 0.23 to 0.30, indicating an observable difference between the datasets.

Conclusion

This study shows that Csim can capture differences between clinical datasets caused by differences in clinical practices. With the outlined vector space modeling framework, AI researchers can create suitable corpora for comparison and correlate Csim with model performance for validation and QA purposes.

People

Samuel Ming Ho Luk, PhDPresenting Author · Boston University/ Boston Medical Center Petros Kalendralis, PhDAuthors · Department of Radiation Oncology (Maastro), GROW School of Oncology and Reproduction, Maastricht University Medical Centre Shaotai HuAuthors · Boston University Alan M. Kalet, PhDAuthors · Department of Radiation Oncology, Fred Hutchinson Cancer Center, University of Washington Kaivalya BhattAuthors · Boston University

Similar sessions

Poster Poster Program

Jul 19 · 07:00

Python-Based Automation Framework for Annual Machine QA Data Archiving In Qatrack+

Annual water-tank measurements help ensure beam characteristics remain consistent with commissioning baselines. However, the lack of a standardized processing workflow and decentralized data storage makes it difficult to analyze...

Syed Bilal Ahmad, PhD