Developing Deep Learning Algorithms to Generate Radiation Oncology Clinical Summaries
Abstract
Purpose
Clinical notes in oncology are generally free-text documents that often repeat clinical information copied from elsewhere in the patient’s medical record, creating inefficiency and risk of propagation error. Therefore, the primary objective of our research is to develop and evaluate a deep learning (DL)-based algorithm that is capable of generating audience-tailored clinical summaries in oncology.
Methods
Our algorithm is being trained and validated using existing radiation oncology information system (OIS) data and clinical notes at our centre. Specifically, we are: (1) building and testing software that, for clinical notes from each specialist type, parses through the notes to identify information already stored in the OIS (“redundant information”) and stores them in a structured format called mCODE (minimal Common Oncology Data Elements); (2) analysing the notes to identify non-existing content expected in the notes (“specialist-provided information”); (3) reviewing our findings with clinicians and patient partners; and (4) developing a DL algorithm to generate semi-populated radiation oncology clinical notes after consultations, so only the specialist-provided information needs to be added.
Results
We have developed a software package that leverages local installations of large language models (LLMs) on a secure hospital server to label redundant information across clinical notes using mCODE. We first validated our software on an external dataset and then used it to analyse real treatment plans and clinical notes. Early results indicate that we can reliably identify and label mCODE-compliant redundant information, including diagnosis, cancer stage, patient background, and treatment details, within clinical notes.
Conclusion
Extracting and analysing redundant information contained in existing clinical notes of different specialist origins enables the identification of the content specialists expect in them. This is a crucial step that allows us to properly parameterise a generative model to generate existing information that the specialists deem relevant and only prompt entry of new information.