Poster Poster Program Clinical Trials Specialty Program

A Retrospective Study Protocol Benchmarking Natural Language Processing and Computer Vision Models for Named Entity Recognition In Prostate Cancer: Extracting Staging, Histology, and Biomarker Data from Clinical Text (HL7 FHIR mCODE)

Abstract
Purpose

Approximately 80% of clinically actionable oncology data remains siloed in unstructured formats, including clinical narratives and pathology reports. This study validates an end-to-end Natural Language Processing (NLP) pipeline for extracting oncological Named Entities and transforming them into HL7 FHIR resources compliant with the Minimal Common Oncology Data Elements (mCODE) Implementation Guide, enabling computable, interoperable cancer data at scale.

Methods

This retrospective, observational validation study analyzes de-identified longitudinal clinical notes from 50 prostate cancer patients receiving radiation therapy. Two domain experts will independently annotate clinical history & workup summaries using standardized guidelines within a dedicated platform, with inter-annotator agreement assessed using Cohen's-kappa. A commercial NLP and Computer Vision platform will be benchmarked against transformer-based open-source models (BioBERT, SciSpacy, GPT-based architectures) for Named Entity Recognition across mCODE-defined elements including TNM staging, histologic grade, biomarker status, treatment response and 409 other. Extracted entities will be programmatically mapped to Value Set Authority Center terminologies (SNOMED CT, RxNorm, LOINC) via FHIR CodeX specifications. Entity-level performance will be quantified using accuracy with Wilson score 95% confidence intervals. McNemar's test will evaluate discordant pairs between predictions and gold standard to detect significant performance differences. A one-sided binomial test will assess whether models achieve the 95% accuracy threshold. Subgroup analyses will examine performance variation across mCODE data elements.

Results

Primary outcomes include element-wise accuracy with confidence intervals for each mCODE category and comparative performance rankings across models. Secondary outcomes will characterize error patterns, identify concepts with suboptimal extraction, and quantify FHIR mapping fidelity.

Conclusion

This study establishes a rigorous validation framework for clinical NLP pipelines in oncology. Findings will inform technology selection, identify therapeutic targets, and de-risk institutional AI investment. Successful validation will accelerate automated clinical trial screening, real-time quality measure reporting, and population-level cancer surveillance bridging unstructured documentation and actionable, standards-compliant oncology data.

People

Related

Similar sessions