Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer.

Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110,439 FunFams in 2735 superfamilies which can be used to functionally annotate>16 million domain sequences.

Availability And Implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam.

Contact: sayoni.das.12@ucl.ac.uk

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4612221PMC
http://dx.doi.org/10.1093/bioinformatics/btv398DOI Listing

Publication Analysis

Top Keywords

function annotation
8
functional classification
4
classification cath
4
cath superfamilies
4
superfamilies domain-based
4
domain-based approach
4
approach protein
4
protein function
4
annotation motivation
4
motivation computational
4

Similar Publications

Cervical cancer is the fourth most common cancer among women globally, and studies have shown that genetic variants play a significant role in its development. A variety of germline and somatic mutations are associated with cervical cancer. However, genomic data derived from these mutations have not been extensively utilized for the development of repurposed drugs for cervical cancer.

View Article and Find Full Text PDF

Genome-wide association studies are enriched for interacting genes.

BioData Min

January 2025

The Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, 90069, USA.

Background: With recent advances in single cell technology, high-throughput methods provide unique insight into disease mechanisms and more importantly, cell type origin. Here, we used multi-omics data to understand how genetic variants from genome-wide association studies influence development of disease. We show in principle how to use genetic algorithms with normal, matching pairs of single-nucleus RNA- and ATAC-seq, genome annotations, and protein-protein interaction data to describe the genes and cell types collectively and their contribution to increased risk.

View Article and Find Full Text PDF

TOM40 as a prognostic oncogene for oral squamous cell carcinoma prognosis.

BMC Cancer

January 2025

Department of Otorhinolaryngology, Shenzhen Key Laboratory of Otorhinolaryngology, Longgang Otorhinolaryngology Hospital, Shenzhen Institute of Otorhinolaryngology, No. 3004 Longgang Avenue, Shenzhen, Guangdong, China.

Background: To investigate the role of the translocase of the outer mitochondrial membrane 40 (TOM40) in oral squamous cell carcinoma (OSCC) with the aim of identifying new biomarkers or potential therapeutic targets.

Methods: TOM40 expression level in OSCC was evaluated using datasets downloaded from The Cancer Genome Atlas (TCGA), as well as clinical data. The correlation between TOM40 expression level and the clinicopathological parameters and survival were analyzed in TCGA.

View Article and Find Full Text PDF

The expansion and loss of specific olfactory genes in relatives of parasitic lice, the stored-product psocids (Psocodea: Liposcelididae).

BMC Genomics

January 2025

Key Laboratory of Entomology and Pest Control Engineering, College of Plant Protection, Southwest University, Chongqing, 400715, China.

Background: Booklice, belonging to the genus Liposcelis (Psocodea: Liposcelididae), commonly known as psocids, infest a wide range of stored products and are implicated in the transmission of harmful microorganisms such as fungi and bacteria. The olfactory system is critical for insect feeding and reproduction. Elucidating the molecular mechanisms of the olfactory system in booklice is crucial for developing effective control strategies.

View Article and Find Full Text PDF

Background: To assess the utility of the TCGA molecular classification of endometrial cancer in a well-annotated, moderately sized, consecutive cohort of Chinese patients with ovarian clear cell carcinoma (OCCC).

Methods: We performed DNA sequencing on 80 OCCC patients via a panel that contains 520 cancer-related genes. The TCGA molecular subtyping method was utilized for classification.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!