Background: The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic function prediction tool, ECPred, based on ensemble of machine learning classifiers.

Results: In ECPred, each EC number constituted an individual class and therefore, had an independent learning model. Enzyme vs. non-enzyme classification is incorporated into ECPred along with a hierarchical prediction approach exploiting the tree structure of the EC nomenclature. ECPred provides predictions for 858 EC numbers in total including 6 main classes, 55 subclass classes, 163 sub-subclass classes and 634 substrate classes. The proposed method is tested and compared with the state-of-the-art enzyme function prediction tools by using independent temporal hold-out and no-Pfam datasets constructed during this study.

Conclusions: ECPred is presented both as a stand-alone and a web based tool to provide probabilistic enzymatic function predictions (at all five levels of EC) for uncharacterized protein sequences. Also, the datasets of this study will be a valuable resource for future benchmarking studies. ECPred is available for download, together with all of the datasets used in this study, at: https://github.com/cansyl/ECPred . ECPred webserver can be accessed through http://cansyl.metu.edu.tr/ECPred.html .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6150975PMC
http://dx.doi.org/10.1186/s12859-018-2368-yDOI Listing

Publication Analysis

Top Keywords

ecpred
8
prediction enzymatic
8
enzymatic functions
8
protein sequences
8
enzymatic function
8
function prediction
8
datasets study
8
prediction
5
classes
5
ecpred tool
4

Similar Publications

Background: An association between impaired exercise capacity and risk of mortality has been reported among adults with congenital heart disease (CHD). Over the years, treatment methods have improved and may influence outcome. Hence, we report data from a national cohort reflecting a contemporary population.

View Article and Find Full Text PDF

Predicting enzymatic function of protein sequences with attention.

Bioinformatics

October 2023

Univ Rennes, Inria, CNRS, IRISA-UMR 6074, Rennes 35000, France.

Motivation: There is a growing number of available protein sequences, but only a limited amount has been manually annotated. For example, only 0.25% of all entries of UniProtKB are reviewed by human annotators.

View Article and Find Full Text PDF

Assigning enzyme commission (EC) numbers using sequence information alone has been the subject of recent classification algorithms where statistics, homology and machine-learning based methods are used. This work benchmarks performance of a few of these algorithms as a function of sequence features such as chain length and amino acid composition (AAC). This enables determination of optimal classification windows for sequence generation and enzyme design.

View Article and Find Full Text PDF

Classification and annotation of enzyme proteins are fundamental for enzyme research on biological metabolism. Enzyme Commission (EC) numbers provide a standard for hierarchical enzyme class prediction, on which several computational methods have been proposed. However, most of these methods are dependent on prior distribution information and none explicitly quantifies amino-acid-level relations and possible contribution of sub-sequences.

View Article and Find Full Text PDF

Seq2Enz: An application of mask BLAST methodology with a new chemical logic of amino acids for improved enzyme function prediction.

Biochim Biophys Acta Proteins Proteom

January 2022

Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi 110016, India; Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi 110016, India; Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi 110016, India. Electronic address:

Seq2Enz method is a new way to identify whether a query protein sequence is an enzyme and to assign an enzyme class to the protein sequence. The method is based on mask BLAST fortified with some novel structural-chemical properties (NCL) of the building blocks of proteins. All available reviewed enyme sequences (267,276 in number) in Uniprot/SwissProt and most recent depositions (7062) not used for training in ECPred, a state of the art software for enzyme class prediction, are taken for assessment and the results are compared with those from conventional BLAST and ECPred respectively.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!