Multi-Tier Automated Pipeline for TG-263 Structure Name Standardization Using Rule-Based, NLP, and RAG-Enhanced LLM Matching
Abstract
Purpose
Big-data radiotherapy research faces challenges in achieving consistent structure nomenclature across large hospital networks. This study develops and validates a novel automated pipeline for standardizing structure names. Our multi-tier matching pipeline integrates rule-based algorithms, natural language processing (NLP), and retrieval-augmented generation (RAG) with large language models (LLMs) using a large dataset of 14,962 plans.
Methods
We analyzed 14,962 plans, encompassing 111,566 plan evaluations across 5,386 patients. Organs-at-risk (OAR) were delineated using TG-263-compliant AI auto-contouring tools. The dataset initially contained 7,710 unique structure names, which were normalized to 4,174 names. Ground truth (106 structure names) was established using 73 institutional dose constraint protocols (TG-263 aligned). To automate standardization, a three-tier matching pipeline was developed: Tier 1 (Rule-based) targeted high-frequency patterns and dose-encoded strings via dictionary mapping; Tier 2 (NLP) handled minor typographic and character variations; and Tier 3 (LLM with RAG) utilized LLaMA 3.1 for advanced semantic matching. Performance was benchmarked using usage-weighted match rates, prioritizing high-frequency clinical structures to ensure real-world reliability and dosimetric relevance.
Results
The proposed pipeline achieved a 96.1% usage-weighted match rate, significantly improving compliance from a 66.4% baseline. Usage pattern analysis revealed a highly skewed data distribution, where 127 active protocols (13% of observed protocols) generated 85.7% of the total clinical volume. This finding validated our Tier 1-centric design. Tier 1 captured 82.2% of the volume, while Tier 2 captured 10.6%, and Tier 3 resolved 3.3%, leaving only 3.9% unmatched. Despite observing a 41-fold increase in unique structure names compared to the standardized set, the pipeline effectively managed systematic deviations.
Conclusion
Significant deviations in nomenclature persist despite TG-263-compliant auto-contouring. Our reliable, multi-tier pipeline systematically standardizes non-compliant contours, neutralizing clinical variations. This enables robust large-scale data aggregation and facilitates multi-institutional clinical trials by ensuring participating institutions' plan submissions strictly adhere to standardized protocols.