Analyzing Cyclegan Training Behavior with Offline Average Loss for CBCT-to-CT Image Translation
Abstract
Purpose
While image translation models like CycleGANs are increasingly used in medical imaging tasks, evaluating their training behavior remains challenging. Current practices often rely on batch-wise loss curves or selected visual outputs, both of which can be noisy and may fail to reveal training instabilities or hallucinated features. Although our previous work introduced different masking strategies for CBCT-to-CT image translation, this study specifically focuses on how to evaluate training progress. We explore the use of offline average loss curves, computed with frozen gradients after training, to provide a more stable and Interpretable view of CycleGAN learning dynamics.
Methods
We trained multiple CycleGAN models using previously established masking strategies: no mask, patch-weighted mask, and enforced pixel-level masking. Rather than monitoring volatile batch-level losses during training, we computed offline epoch-level average generator and discriminator losses. These loss curves were then analyzed to assess training convergence, stability, and the effectiveness of the different training configurations.
Results
Among the evaluated approaches, the model using enforced masking demonstrated the most stable learning behavior, with consistently decreasing average losses across training. The discriminator loss in particular showed a balanced adversarial pattern, suggesting reduced overfitting and improved anatomical consistency. Offline average loss curves allowed us to detect subtle instabilities and convergence failures that were not apparent from sample images or standard training logs.
Conclusion
This work introduces a loss analysis framework focused on post-training average loss evaluation as a way to monitor CycleGAN training quality. While the masking strategies were explored previously, the current study emphasizes the value of analyzing offline loss trends to assess training sufficiency and model reliability. This method reduces dependence on visual inspection and may support more robust quality control for generative models in clinical imaging tasks.