ZA-Net: Zero-Annotation Nuclei Segmentation In Pathology Images with Vision–Language Pretrained Models
Abstract
Purpose
Nuclei segmentation in histopathology images is fundamental for cancer diagnosis and quantitative analysis, yet existing supervised and weakly supervised methods require extensive manual annotations. Although vision–language pre-trained models enable zero-shot object detection in natural images, zero-annotation nuclei segmentation in H&E pathology images remains largely unexplored due to severe domain gaps and the mismatch between detection and segmentation. This work aims to develop a fully label-free nuclei segmentation framework that eliminates the need for any manual annotations.
Methods
We propose ZA-Net, a zero-annotation nuclei segmentation framework that integrates a vision–language pre-trained object detector with convolutional neural networks in a coarse-to-fine pipeline. ZA-Net operates in three stages: (1) zero-shot coarse nuclei detection using a vision–language detector with customized text prompts, (2) fine-stage nuclei detection that refines coarse detections into accurate nucleus centers using Gaussian confidence maps and a precision learning strategy, and (3) nuclei segmentation using coarse pixel-level pseudo-labels generated from refined point detections via Voronoi partitioning and clustering. All stages are trained without any manual pixel-level or point-level annotations.
Results
ZA-Net is evaluated on two public H&E-stained nuclei segmentation benchmarks, MoNuSeg and CPM. The proposed method achieves Dice scores of 0.72 on MoNuSeg and 0.71 on CPM, outperforming the label-free Segment Anything Model by approximately 30 percentage points. Despite using no annotations, ZA-Net remains competitive with state-of-the-art weakly supervised methods that rely on tens of thousands of labeled nuclei. Cross-dataset experiments further demonstrate improved robustness and generalization compared with weakly supervised baselines.
Conclusion
ZA-Net provides an effective and annotation-free solution for nuclei segmentation in pathology images by bridging vision–language zero-shot detection and pixel-wise segmentation through a coarse-to-fine strategy. The proposed framework substantially reduces annotation cost while achieving strong segmentation performance, making it well suited for large-scale and cross-domain pathology studies.