Segment Anything Model 3 for Concept-Driven Whole-Body Lesion Segmentation-an Experimental Study
Abstract
Purpose
Accurate lesion segmentation is fundamental to medical image analysis, yet most methods are tailored to specific anatomical sites or modalities, limiting their generalizability in diverse clinical settings. While recent vision-language foundation models enable concept-driven segmentation in natural images, the capability of Segment Anything Model-3 (SAM3) for concept-prompt-based lesion segmentation in medical imaging remains underexplored. We systematically evaluated SAM3 for concept-level lesion segmentation across diverse imaging modalities, assessing its whole-body generalization performance.
Methods
We curated 13 datasets (12 lesion types) across whole-body regions and multiple imaging modalities, including CT, multiparametric MRI, PET, dermoscopy, endoscopy, and ultrasound. We evaluated SAM3 using text- and/or image-exemplar-based concept prompts, which provide semantic guidance via textual descriptions or representative images, enabling lesion segmentation based on high-level appearance cues rather than explicit spatial annotations. To enhance clinical robustness, we further incorporated patient-specific prior scans and annotations from MR-guided adaptive radiotherapy workflows as additional input channels and evaluated this strategy on two in-house MR-LINAC datasets: Mix-Seq-Brain(a brain dataset with mixed MR sequences) and One-Seq-Liver(a liver dataset with a single MR sequence).
Results
Extensive experiments demonstrate that SAM3 achieves strong cross-modality generalizability for whole-body lesion segmentation under concept-based prompting. With combined text and image exemplar prompts, SAM3 attains mean Dice scores of 77.29% (brain metastases), 64.95% (ischemic stroke lesions), 86.97% (lung tumors), 77.45% (liver tumors), 84.97% (pancreatic tumors), 83.86% (colon tumors), 90.84% (kidney tumors), 69.69% (soft-tissue sarcoma), 73.65% (skin lesions), 76.51% (polyps), and 87.92% (breast lesions). Furthermore, incorporating prior MRI scans and annotations improves SAM3’s performance on MR-LINAC datasets, yielding mean Dice scores of 85.08% on Mix-Seq-Brain and 84.14% on One-Seq-Liver.
Conclusion
These results highlight the potential of concept-driven foundation models for scalable medical image segmentation in real-world clinical settings, while also indicating that task- or domain-specific fine-tuning may still be required due to performance variability across datasets.