MS-kNN: protein function prediction by integrating multiple data sources.

BMC Bioinformatics

Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA.

Published: July 2013

Background: Protein function determination is a key challenge in the post-genomic era. Experimental determination of protein functions is accurate, but time-consuming and resource-intensive. A cost-effective alternative is to use the known information about sequence, structure, and functional properties of genes and proteins to predict functions using statistical methods. In this paper, we describe the Multi-Source k-Nearest Neighbor (MS-kNN) algorithm for function prediction, which finds k-nearest neighbors of a query protein based on different types of similarity measures and predicts its function by weighted averaging of its neighbors' functions. Specifically, we used 3 data sources to calculate the similarity scores: sequence similarity, protein-protein interactions, and gene expressions.

Results: We report the results in the context of 2011 Critical Assessment of Function Annotation (CAFA). Prior to CAFA submission deadline, we evaluated our algorithm on 1,302 human test proteins that were represented in all 3 data sources. Using only the sequence similarity information, MS-kNN had term-based Area Under the Curve (AUC) accuracy of Gene Ontology (GO) molecular function predictions of 0.728 when 7,412 human training proteins were used, and 0.819 when 35,622 training proteins from multiple eukaryotic and prokaryotic organisms were used. By aggregating predictions from all three sources, the AUC was further improved to 0.848. Similar result was observed on prediction of GO biological processes. Testing on 595 proteins that were annotated after the CAFA submission deadline showed that overall MS-kNN accuracy was higher than that of baseline algorithms Gotcha and BLAST, which were based solely on sequence similarity information. Since only 10 of the 595 proteins were represented by all 3 data sources, and 66 by two data sources, the difference between 3-source and one-source MS-kNN was rather small.

Conclusions: Based on our results, we have several useful insights: (1) the k-nearest neighbor algorithm is an efficient and effective model for protein function prediction; (2) it is beneficial to transfer functions across a wide range of organisms; (3) it is helpful to integrate multiple sources of protein information.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3584913PMC
http://dx.doi.org/10.1186/1471-2105-14-S3-S8DOI Listing

Publication Analysis

Top Keywords

data sources
20
protein function
12
function prediction
12
sequence similarity
12
k-nearest neighbor
8
cafa submission
8
submission deadline
8
proteins represented
8
represented data
8
training proteins
8

Similar Publications

Identifying and quantifying the dominant factors influencing heavy metal (HM) pollution sources are essential for maintaining soil ecological health and implementing effective pollution control measures. This study analyzed soil HM samples from 53 different land use types in Jiaozuo City, Henan Province, China. Pollution sources were identified using Absolute Principal Component Score (APCS), with 8 anthropogenic factors, 9 natural factors, and 4 soil physicochemical properties mapped using Geographic Information System (GIS) kernel density estimation.

View Article and Find Full Text PDF

A Mobile Health Intervention to Support Collaborative Decision-Making in Mental Health Care: Development and Usability.

JMIR Form Res

January 2025

Early Intervention in Psychosis Advisory Unit for South-East Norway, Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway.

Background: Shared decision-making between clinicians and service users is crucial in mental health care. One significant barrier to achieving this goal is the lack of user-centered services. Integrating digital tools into mental health services holds promise for addressing some of these challenges.

View Article and Find Full Text PDF

Background: In Germany, digital transformation and legal regulations are leading to the need to integrate digital technologies into the nursing profession. In addition, to nursing practice, they are also being incorporated into nursing training. Despite comprehensive regulations regarding the use of digital teaching and learning media in nursing education, their specific applicability and implementation vary.

View Article and Find Full Text PDF

Digital Frequency Customized Relieving Sound for Chronic Subjective Tinnitus Management: Prospective Controlled Study.

J Med Internet Res

January 2025

ENT Institute and Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University, Shanghai, China.

Background: Tinnitus is a major health issue, but currently no tinnitus elimination treatments exist for chronic subjective tinnitus. Acoustic therapy, especially personalized acoustic therapy, plays an increasingly important role in tinnitus treatment. With the application of smartphones, personalized acoustic stimulation combined with smartphone apps will be more conducive to the individualized treatment and management of patients with tinnitus.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!