Enzyme classification with peptide programs: a comparative study.

BMC Bioinformatics

Department of Informatics, Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal.

Published: July 2009

Background: Efficient and accurate prediction of protein function from sequence is one of the standing problems in Biology. The generalised use of sequence alignments for inferring function promotes the propagation of errors, and there are limits to its applicability. Several machine learning methods have been applied to predict protein function, but they lose much of the information encoded by protein sequences because they need to transform them to obtain data of fixed length.

Results: We have developed a machine learning methodology, called peptide programs (PPs), to deal directly with protein sequences and compared its performance with that of Support Vector Machines (SVMs) and BLAST in detailed enzyme classification tasks. Overall, the PPs and SVMs had a similar performance in terms of Matthews Correlation Coefficient, but the PPs had generally a higher precision. BLAST performed globally better than both methodologies, but the PPs had better results than BLAST and SVMs for the smaller datasets.

Conclusion: The higher precision of the PPs in comparison to the SVMs suggests that dealing with sequences is advantageous for detailed protein classification, as precision is essential to avoid annotation errors. The fact that the PPs performed better than BLAST for the smaller datasets demonstrates the potential of the methodology, but the drop in performance observed for the larger datasets indicates that further development is required.Possible strategies to address this issue include partitioning the datasets into smaller subsets and training individual PPs for each subset, or training several PPs for each dataset and combining them using a bagging strategy.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2724424PMC
http://dx.doi.org/10.1186/1471-2105-10-231DOI Listing

Publication Analysis

Top Keywords

enzyme classification
8
peptide programs
8
protein function
8
machine learning
8
protein sequences
8
pps
8
higher precision
8
better blast
8
protein
5
classification peptide
4

Similar Publications

Introduction: Dengue viruses cause either symptomatic infections or asymptomatic seroconversion. Symptomatic dengue has a wide clinical spectrum ranging from self-limiting infection to severe manifestations, mostly characterized by plasma leakage with or without hemorrhage. World Health Organization classification in 2009 classified dengue into dengue without warning signs, dengue with warning signs, and severe dengue.

View Article and Find Full Text PDF

Background: To assess the utility of the TCGA molecular classification of endometrial cancer in a well-annotated, moderately sized, consecutive cohort of Chinese patients with ovarian clear cell carcinoma (OCCC).

Methods: We performed DNA sequencing on 80 OCCC patients via a panel that contains 520 cancer-related genes. The TCGA molecular subtyping method was utilized for classification.

View Article and Find Full Text PDF

Understanding the change in plant-associated microbial diversity and secondary metabolite biosynthesis in medicinal plants due to their cultivation in non-natural habitat (NNH) is important to maintain their therapeutic importance. Here, the bacterial endomicrobiome of Podophyllum hexandrum plants of natural habitat (NH; Kardang and Triloknath locations) and NNH (Palampur location) was identified and its association with the biosynthesis of podophyllotoxin (PTOX) was revealed. Rhizomes (source of PTOX) of plants of NH had highest endophytic bacterial diversity compared to NNH-plants.

View Article and Find Full Text PDF

Schistosomiasis poses a significant global health threat, particularly in tropical and subtropical regions like Sudan. Although numerous epidemiological studies have examined schistosomiasis in Sudan, the genetic diversity of Schistosoma haematobium populations, specifically through analysis of the mtcox1 gene, remains unexplored. This study aimed to investigate the risk factors associated with urogenital schistosomiasis among school pupils in El-Fasher, Western Sudan, as well as the mtcox1 genetic diversity of human S.

View Article and Find Full Text PDF

Background: Sugarcane is cultivated globally and affected by more than 125 pathogens, which lead to various plant diseases. In recent years, high-throughput sequencing (HTS)-based genome analyses have been broadly adopted for the discovery of both characterized and un-characterized viruses from plant samples. In this study, the HTS data of sugarcane pooled sample retrieved from sequence read archive (SRA) were de novo re-assembled using CLC Genomic Workbench.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!