Data mining of enzymes using specific peptides.

BMC Bioinformatics

School of Physics and Astronomy, Tel Aviv University, Tel Aviv 69978, Israel.

Published: December 2009

Background: Predicting the function of a protein from its sequence is a long-standing challenge of bioinformatic research, typically addressed using either sequence-similarity or sequence-motifs. We employ the novel motif method that consists of Specific Peptides (SPs) that are unique to specific branches of the Enzyme Commission (EC) functional classification. We devise the Data Mining of Enzymes (DME) methodology that allows for searching SPs on arbitrary proteins, determining from its sequence whether a protein is an enzyme and what the enzyme's EC classification is.

Results: We extract novel SP sets from Swiss-Prot enzyme data. Using a training set of July 2006, and test sets of July 2008, we find that the predictive power of SPs, both for true-positives (enzymes) and true-negatives (non-enzymes), depends on the coverage length of all SP matches (the number of amino-acids matched on the protein sequence). DME is quite different from BLAST. Comparing the two on an enzyme test set of July 2008, we find that DME has lower recall. On the other hand, DME can provide predictions for proteins regarded by BLAST as having low homologies with known enzymes, thus supplying complementary information. We test our method on a set of proteins belonging to 10 bacteria, dated July 2008, establishing the usefulness of the coverage-length cutoff to determine true-negatives. Moreover, sifting through our predictions we find that some of them have been substantiated by Swiss-Prot annotations by July 2009. Finally we extract, for production purposes, a novel SP set trained on all Swiss-Prot enzymes as of July 2009. This new set increases considerably the recall of DME. The new SP set is being applied to three metagenomes: Sargasso Sea with over 1,000,000 proteins, producing predictions of over 220,000 enzymes, and two human gut metagenomes. The outcome of these analyses can be characterized by the enzymatic profile of the metagenomes, describing the relative numbers of enzymes observed for different EC categories.

Conclusions: Employing SPs for predicting enzymatic activity of proteins works well once one utilizes coverage-length criteria. In our analysis, L >or= 7 has led to highly accurate results.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2811123PMC
http://dx.doi.org/10.1186/1471-2105-10-446DOI Listing

Publication Analysis

Top Keywords

july 2008
12
data mining
8
mining enzymes
8
specific peptides
8
protein sequence
8
set july
8
2008 find
8
july 2009
8
enzymes
7
set
6

Similar Publications

Background: Remote ischemic conditioning (RIC) is a simple and low-cost intervention that is thought to increase collateral blood flow through the vasodilatory effects of nitric oxide (NO) produced by the endothelium and red blood cells (RBCs). This study aims to investigate whether RIC affects RBC deformability and levels of NO and nitrite in patients with ischemic stroke.

Methods: This is a predefined substudy to the RESIST (Remote Ischemic Conditioning in Patients With Acute Stroke Trial) randomized clinical trial conducted in Denmark.

View Article and Find Full Text PDF

Importance: Neonatal protein intake following very preterm birth has long lasting effects on brain development. However, it is uncertain whether these effects are associated with improved or impaired brain maturation.

Objective: To assess the association of neonatal protein intake following very preterm birth with brain structure at 7 years of age.

View Article and Find Full Text PDF

Aim: In February 2024, the Aotearoa New Zealand Government repealed legislation to mandate very low nicotine cigarettes (VLNCs), greatly reduce the number of tobacco retailers and disallow sale of tobacco products to people born after 2008 (smokefree generation). We investigated acceptability and likely impacts of these measures among people who smoke or who recently (≤2 years) quit smoking.

Method: We analysed data from 1,230 participants from Wave 3 (conducted in late 2020 and early 2021) and 615 participants from Wave 3.

View Article and Find Full Text PDF

Objective: Prior research has identified that people with Parkinson's reporting lower levels of self-efficacy exhibit worsening motor and non-motor symptomology, reduced quality of life, and self-management. Our key objective was to conduct a scoping review examining the impact of digital health technologies on self-efficacy in people with Parkinson's.

Design: A scoping review using Arksey and O'Malley's (2005) framework was undertaken.

View Article and Find Full Text PDF

The present study investigates the potential contribution of Photobiomodulation (PBM) to the regeneration of the bone following the extraction of the first mandibular molar in rats. The study evaluates the efficacy of PBM, using both Low-Level Laser Therapy (LLLT) and Light-Emitting Diode Therapy (LEDT), as promotors of osteoblastic activity and the formation of new bone. Study design, setting, and sample: 45 male Wistar rats were divided randomly into three groups of 15 individuals - (i) control group (left lower molar removed only), (ii) the LLL group (molar removed, followed by LLLT), and (iii) the LED group (molar removed, followed by LEDT).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!