An Ensemble Learning–Inspired Two-Stage Multi-Expert Framework for MRI Brain Tumor Classification
Abstract
Purpose
Convolutional neural networks (CNNs) are effective in modeling local textures and fine-grained spatial structures, while Coordinate Attention captures long-range spatial dependencies, enhancing global contextual awareness and preserving positional information. These approaches are complementary. A common strategy directly fuses CNN and Coordinate Attention features, but this often increases model parameters and overfitting risk, particularly in limited-data medical imaging scenarios. This raises the question of whether complementary strengths can be leveraged to achieve synergistic performance without substantially increasing complexity. To address this, we propose a two-stage multi-expert fusion framework.
Methods
We employ an MRI brain tumor classification dataset to evaluate the proposed framework. The training process follows a two-stage strategy inspired by ensemble learning. In Stage 1, a CNN-based expert and a Coordinate-Attention-based expert are trained independently on the same training split to learn complementary representations. Specifically, the CNN expert focuses on modeling local texture and fine-grained spatial structures, while the Coordinate Attention expert emphasizes long-range spatial dependencies and global contextual information. In Stage 2, the feature extractors of both experts are frozen, and only a lightweight fusion module is trained to generate the final predictions. By decoupling representation learning from expert fusion, this two-stage design effectively mitigates overfitting and achieves more stable performance with minimal additional parameters.
Results
The independently trained Coordinate-Attention and Rotate-to-Attend experts exhibit different performance characteristics on the MRI brain tumor classification task. The Coordinate-Attention expert reaches a Top-1 accuracy of 95.37%, while the Rotate-to-Attend expert achieves 89.61% at its optimal epoch. The observed differences in performance suggest that the two models emphasize distinct aspects of image representation, supporting their complementary roles within the proposed framework.
Conclusion
This system effectively integrates the strengths of different networks while avoiding the overfitting risks associated with cascaded networks. This approach significantly enhances the performance of medical image classification systems.