Clinical Reports As Semantic Priors: Reducing False Positives In Automated PSMA PET/CT Segmentation
Abstract
Purpose
Automated quantification of tumor burden in PSMA PET/CT imaging is hampered by the low specificity of image-only AI models, which frequently misclassify physiological uptake as disease. This necessitates time-intensive manual corrections, limiting clinical utility. We hypothesize that integrating routinely available clinical reports can provide the necessary semantic context to suppress false positives (FPs) and improve segmentation specificity without compromising sensitivity.
Methods
We developed a multimodal 3D SegResNet that fuses PET/CT volumes with clinical report embeddings. Text features were extracted using three distinct encoders (BiomedVLP, RadGraph, GPT) and integrated via cross-attention gates at three decoder scales. The framework was trained and validated using 5-fold cross-validation on a dataset of 284 prostate cancer patients with expert-curated whole-body masks. Performance was assessed on a hold-out test set (N=41) using overlap metrics (Dice, IoU), boundary accuracy (ASSD, HD95, Surface Dice), and detection metrics (lesion-level FPs and sensitivity).
Results
Report guidance significantly improved segmentation specificity. The BiomedVLP-guided model achieved the most substantial reduction in errors, lowering false-positive lesion counts by approximately 3.8 per patient across matching criteria (a reduction of ~76%) (p<0.001). Boundary adherence also improved drastically, with BiomedVLP reducing the 95th-percentile Hausdorff Distance (HD95) by 73% (331 mm decrease) (p<0.001) and Average Symmetric Surface Distance (ASSD) by 78% (82 mm decrease) (p<0.001) compared to the image-only baseline. Crucially, these gains in specificity occurred with negligible impact on lesion-level sensitivity (relative changes less than 4%; p=1.0).
Conclusion
Fusing report semantics with visual data effectively penalizes spurious AI predictions in PSMA PET/CT segmentation. This approach leverages existing clinical infrastructure to produce cleaner, more robust automated contours, significantly reducing the human-in-the-loop burden required for quantitative workflows.