Biologically Informed Graph Neural Network Construction Via Pathway-Based Geneinteraction Graphs
Abstract
Purpose
Biological systems are inherently structured, with genes and molecular components interacting through organized pathways and networks. Graph-based representations are therefore widely used in biological analysis. However, many existing deep learning approaches on biological graphs focus primarily on molecular feature values (e.g., gene expression) while making limited use of the underlying graph structure. Consequently, information about biological organization and pathway-level interactions may be underutilized. There is a need for end-to-end learning frameworks that explicitly model biological structure and integrate it into downstream predictive tasks.
Methods
We propose an end-to-end graph-based learning framework that captures sample-specific biological structure. Pathway-informed gene interaction graphs are constructed per sample, with genes as nodes and molecular measurements as node features. A graph-level classifier is first applied to each sample-specific graph to learn a compact embedding summarizing its properties. This embedding is then combined with complementary molecular features and passed to a downstream classifier for prediction. This unified pipeline jointly optimizes structure-aware representation learning and feature-based prediction and is evaluated on publicly available cancer datasets.
Results
The proposed framework learns discriminative graph embeddings that capture sample-specific biological organization. On the TCGA pan-cancer dataset, modeling pathway-informed graph structure improved accuracy by up to 34% over state-of-the-art GNNs that rely primarily on molecular features. These findings demonstrate that incorporating biological structure at the graph level provides meaningful additional information for cancer classification tasks.
Conclusion
We present an end-to-end graph learning framework that integrates pathway-informed biological structure into predictive modeling. By learning sample-specific graph representations and coupling them with downstream classification, the proposed approach captures both molecular features and their biological organization. This framework offers a structured and biologically meaningful foundation for graph-based modeling in cancer research and other complex biological applications.