Poster Poster Program Diagnostic and Interventional Radiology Physics

Staged Alignment of Decoder Large Language Models for Oncology Tasks Via Radiology-Pathology Note Pairing

Abstract
Purpose

Decoder large language models (LLMs) like ChatGPT and Claude show excellent text understanding in general domains but remain under-aligned to oncology-specific clinical context. This study aims to design and validate an alignment pipeline for decoder-based LLMs, enabling decoder models to produce progressively refined oncology-relevant representations through staged training.

Methods

We designed a multi-stage alignment pipeline to repurpose Llama-3.1-8B into a clinical text embedder using LLM2Vec on 97,398 matched radiology-pathology reports covering nine brain tumor types. Stage 1 used masked next-token prediction to encourage bidirectional attention within the decoder. Stage 2 applied Simple Contrastive Learning of Sentence Embeddings (SimCSE), a self-supervised contrastive learning method, on unlabeled reports. Stage 3 performed supervised contrastive alignment on paired reports to learn oncology-relevant representations. We evaluated embeddings with 5-fold cross-validation on three prognostic tasks: tumor type classification (n=4,046), MGMT promoter methylation prediction (n=201), and one-year survival prediction (n=539), using zero-shot GPT-4o as a decoder LLM baseline.

Results

Across all endpoints, performance improved from Stage 1 through Stage 3, with all stages substantially outperforming zero-shot ChatGPT 4o. For tumor type classification, accuracy/F1-macro/AUROC rose from 61.5% / 49.0% / 81.4% at Stage 1 to 73.4% / 59.4% / 90.5% at Stage 2, and reached 87.2% / 86.1% / 96.2% at Stage 3, versus 85.5% / 84.0% / 35.1% for GPT-4o. For one-year survival, Stage 2 achieved 71.8% accuracy, 69.4% F1-macro, and 79.5% AUROC, compared to GPT-4o's 71.1% / 56.4% / 30.0%. For MGMT methylation, Stage 3 reached 77.1% / 60.0% / 76.1%, far exceeding GPT-4o (45.6% / 33.1% / 54.5%).

Conclusion

The fully aligned decoder achieved strong downstream performance, with consistent stage-wise gains demonstrating both the efficacy of the multi-stage pipeline design and the value of radiology-pathology matching as an alignment task for adapting generative LLMs to various oncology-specific predictive tasks.

People

Related

Similar sessions

Poster Poster Program
Jul 19 · 07:00
B-Trac – Breast Tissue Rotation and Compression Apparatus for Calibration

Mammography (compressed 2D) and MRI (uncompressed 3D) capture breast tissue under different conditions, complicating tumor localization across modalities. To bridge this gap, we developed a customizable physical platform to simul...

Dayadna Hernandez Perez
Diagnostic and Interventional Radiology Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
Comprehensive Medical Physics Assessment of Digital Mammography Equipment: A Three-Year Multi-Site Evaluation of Technical Performance and Radiation Safety at 24 Saudi Arabian Healthcare Institutions (2022–2024)

To conduct a comprehensive multi-center audit evaluating the technical performance, image quality, and radiation safety of digital mammography systems across 24 unique healthcare facilities in Saudi Arabia. This study aims to est...

Sami Alshaikh, PhD
Diagnostic and Interventional Radiology Physics 0 people interested
Poster Poster Program
Jul 19 · 07:00
Starting Small: Implementing a CT Protocol Optimization Program

This talk describes our organization’s CT optimization program, and how we implemented it to make efficient use of limited physicist time.

Robert J. Cropp, PhD
Diagnostic and Interventional Radiology Physics 0 people interested