The study of protein-protein interactions (PPIs) holds immense significance in understanding various biological activities, as well as in drug discovery and disease diagnosis. Existing deep learning methods for PPI prediction, including graph neural networks (GNNs), have been widely employed as the solutions, while they often experience a decline in performance in the real world. We claim that the topological shortcut is one of the key problems contributing negatively to the performance, according to our analysis. By modeling the PPIs as a graph with protein as nodes and interactions as edge types, the prevailing models tend to learn the pattern of nodes' degrees rather than intrinsic sequence-structure profiles, leading to the problem termed topological shortcut. The huge data growth of PPI leads to intensive computational costs and challenges computing devices, causing infeasibility in practice. To address the discussed problems, we propose a label-aware hierarchical subgraph learning method (laruGL-PPI) that can effectively infer PPIs while being interpretable. Specifically, we introduced edge-based subgraph sampling to effectively alleviate the problems of topological shortcuts and high computing costs. Besides, the inner-outer connections of PPIs are modeled as a hierarchical graph, together with the dependencies between interaction types constructed by a label graph. Extensive experiments conducted across various scales of PPI datasets have conclusively demonstrated that the laruGL-PPI method surpasses the most advanced PPI prediction techniques currently available, particularly in the testing of unseen proteins. Also, our model can recognize crucial sites of proteins, such as surface sites for binding and active sites for catalysis.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.jmb.2024.168737 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!