Motivation: Protein representation learning methods have shown great potential to many downstream tasks in biological applications. A few recent studies have demonstrated that the self-supervised learning is a promising solution to addressing insufficient labels of proteins, which is a major obstacle to effective protein representation learning. However, existing protein representation learning is usually pretrained on protein sequences without considering the important protein structural information.
Results: In this work, we propose a novel structure-aware protein self-supervised learning method to effectively capture structural information of proteins. In particular, a graph neural network model is pretrained to preserve the protein structural information with self-supervised tasks from a pairwise residue distance perspective and a dihedral angle perspective, respectively. Furthermore, we propose to leverage the available protein language model pretrained on protein sequences to enhance the self-supervised learning. Specifically, we identify the relation between the sequential information in the protein language model and the structural information in the specially designed graph neural network model via a novel pseudo bi-level optimization scheme. We conduct experiments on three downstream tasks: the binary classification into membrane/non-membrane proteins, the location classification into 10 cellular compartments, and the enzyme-catalyzed reaction classification into 384 EC numbers, and these experiments verify the effectiveness of our proposed method.
Availability And Implementation: The Alphafold2 database is available in https://alphafold.ebi.ac.uk/. The PDB files are available in https://www.rcsb.org/. The downstream tasks are available in https://github.com/phermosilla/IEConv\_proteins/tree/master/Datasets. The code of the proposed method is available in https://github.com/GGchen1997/STEPS_Bioinformatics.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10139775 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btad189 | DOI Listing |
Nat Commun
January 2025
Los Alamos National Laboratory, EES-17 National Security Earth Science, Los Alamos, NM, 87545, USA.
Significant progress has been made in probing the state of an earthquake fault by applying machine learning to continuous seismic waveforms. The breakthroughs were originally obtained from laboratory shear experiments and numerical simulations of fault shear, then successfully extended to slow-slipping faults. Here we apply the Wav2Vec-2.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Electrical Power, Adama Science and Technology University, Adama, 1888, Ethiopia.
Although the Transformer architecture has established itself as the industry standard for jobs involving natural language processing, it still has few uses in computer vision. In vision, attention is used in conjunction with convolutional networks or to replace individual convolutional network elements while preserving the overall network design. Differences between the two domains, such as significant variations in the scale of visual things and the higher granularity of pixels in images compared to words in the text, make it difficult to transfer Transformer from language to vision.
View Article and Find Full Text PDFNeural Netw
January 2025
School of Cyber Science and Engineering, Xi'an Jiaotong University, China. Electronic address:
Detecting anomalies in attributed networks has become a subject of interest in both academia and industry due to its wide spectrum of applications. Although most existing methods achieve desirable performance by the merit of various graph neural networks, the way they bundle node-affiliated multidimensional attributes into a whole for embedding calculation hinders their ability to model and analyze anomalies at the fine-grained feature level. To characterize anomalies from each feature dimension, we propose Eagle, a deep framework based on bipartitE grAph learninG for anomaLy dEtection.
View Article and Find Full Text PDFMed Image Anal
January 2025
ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, Strasbourg, France.
Instance segmentation of surgical instruments is a long-standing research problem, crucial for the development of many applications for computer-assisted surgery. This problem is commonly tackled via fully-supervised training of deep learning models, requiring expensive pixel-level annotations to train. In this work, we develop a framework for instance segmentation not relying on spatial annotations for training.
View Article and Find Full Text PDFHypertension is a critical risk factor and cause of mortality in cardiovascular diseases, and it remains a global public health issue. Therefore, understanding its mechanisms is essential for treating and preventing hypertension. Gene expression data is an important source for obtaining hypertension biomarkers.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!