Reinforcement Learning–Guided Pseudo-Label Purification for Robust Unsupervised Domain Adaptation In Medical Image Segmentation
Abstract
Purpose
Accurate medical image segmentation is crucial for clinical workflows such as radiotherapy planning and longitudinal disease monitoring. However, high-quality pixel/voxel annotations are costly and time-consuming to obtain, limiting large-scale supervised training. In addition, segmentation models often suffer from domain shift caused by differences in scanners, protocols, and acquisition conditions, leading to performance degradation when deployed across hospitals or modalities. To address these issues, we propose an unsupervised domain adaptation (UDA) framework that improves cross-domain segmentation by refining noisy pseudo-labels and enhancing robustness in diverse clinical environments.
Methods
We introduce a reinforcement learning (RL)-based UDA method for medical image segmentation. The target segmentation is decomposed into spatial regions via anchor patches, enabling region-wise composition and decomposition to estimate pseudo-label reliability. Based on these reliability cues, an RL agent adaptively refines pseudo-labels by modulating supervision strength: confident regions receive stronger learning signals, while uncertain or ambiguous regions are down-weighted to suppress noise. This dynamic refinement strategy reduces overfitting to erroneous pseudo-labels and promotes stable generalization under domain shift.
Results
The proposed approach is evaluated on three cross-domain segmentation benchmarks: (1) CT–CBCT segmentation for nasopharyngeal target, parotid gland, breast target, and heart (averaged); (2) Learn2Reg CT–MR segmentation for liver, spleen, and kidneys (averaged); and (3) AMOS CT–MR segmentation for liver, spleen, and kidneys (averaged). Compared with strong UDA baselines (e.g., CDDSA, CSDASA, FPL+, PSIGAN, RAM-Sir, DDSPSeg, DBSN, and DAFormer), our method consistently delivers state-of-the-art performance across tasks, demonstrating improved accuracy and robustness.
Conclusion
In summary, combining RL-driven supervision control with target-domain pseudo-label refinement provides an effective solution for UDA medical image segmentation, mitigating domain shift and pseudo-label noise to achieve reliable cross-domain performance.