Protein annotation has long been a challenging task in computational biology. Gene Ontology (GO) has become one of the most popular frameworks to describe protein functions and their relationships. Prediction of a protein annotation with proper GO terms demands high-quality GO term representation learning, which aims to learn a low-dimensional dense vector representation with accompanying semantic meaning for each functional label, also known as embedding. However, existing GO term embedding methods, which mainly take into account ancestral co-occurrence information, have yet to capture the full topological information in the GO-directed acyclic graph (DAG). In this study, we propose a novel GO term representation learning method, PO2Vec, to utilize the partial order relationships to improve the GO term representations. Extensive evaluations show that PO2Vec achieves better outcomes than existing embedding methods in a variety of downstream biological tasks. Based on PO2Vec, we further developed a new protein function prediction method PO2GO, which demonstrates superior performance measured in multiple metrics and annotation specificity as well as few-shot prediction capability in the benchmarks. These results suggest that the high-quality representation of GO structure is critical for diverse biological tasks including computational protein annotation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10917077PMC
http://dx.doi.org/10.1093/bib/bbae077DOI Listing

Publication Analysis

Top Keywords

protein annotation
12
partial order
8
gene ontology
8
protein function
8
function prediction
8
prediction protein
8
term representation
8
representation learning
8
embedding methods
8
biological tasks
8

Similar Publications

Annotation of RxLR Effectors in Oomycete Genomes.

Methods Mol Biol

December 2024

Horticultural Crops Disease and Pest Management Research Unit, United States Department of Agriculture-Agricultural Research Service, Corvallis, OR, USA.

Pathogens have evolved effector proteins to suppress host immunity and facilitate plant infections. RxLR effectors are small, secreted effector proteins with conserved RxLR and dEER amino acid motifs at the N terminus and highly variable C termini and are commonly found in oomycete species. We provide computational approaches to annotate RxLR candidate effector genes in a genome assembly in FASTA format with an available GFF file.

View Article and Find Full Text PDF

Exploring the Venom Gland Transcriptome of and : De Novo Assembly and Analysis of Novel Toxic Proteins.

Toxins (Basel)

November 2024

Facultad de Ciencias Exactas y Naturales, Pontificia Universidad Católica del Ecuador, Quito 170525, Ecuador.

Previous proteomic studies of viperid venom revealed that it is mainly composed of metalloproteinases (SVMPs), serine proteinases (SVSPs), phospholipase A2 (PLA2), and C-type lectins (CTLs). However, other proteins appear in minor amounts that affect prey and need to be identified. This study aimed to identify novel toxic proteins in the venom gland transcriptome of and , using data from NCBI.

View Article and Find Full Text PDF

A pathogen strain responsible for sweet potato stem and foliage scab disease was isolated from sweet potato stems. Through a phylogenetic analysis based on the rDNA internal transcribed spacer (ITS) region, combined with morphological methods, the isolated strain was identified as To comprehensively analyze the pathogenicity of the isolated strain from a genetic perspective, the whole-genome sequencing of HD-1 was performed using both the PacBio and Illumina platforms. The genome of HD-1 is about 26.

View Article and Find Full Text PDF

Carbohydrate-binding modules (CBMs) are essential virulence factors in phytopathogens, particularly the extensively studied members from the CBM50 gene family, which are known as lysin motif (LysM) effectors and which play crucial roles in plant-pathogen interactions. However, the function of CBM50 in has yet to be fully studied. In this study, we identified seven CBM50 genes from the genome through complete sequence analysis and functional annotation.

View Article and Find Full Text PDF

Genome Sequencing and Metabolic Potential Analysis of .

J Fungi (Basel)

December 2024

Hubei Key Laboratory of Natural Medicinal Chemistry and Resource Evaluation, School of Pharmacy, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.

is an edible and medicinal macrofungus with significant biological activity and broad pharmaceutical prospects that has received increasing attention in recent years. Although it is an important resource for macrofungi, knowledge of it remains limited. In this study, we sequenced, de novo assembled, and annotated the whole genome of using a PacBio Sequel II sequencer.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!