Background: The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic function prediction tool, ECPred, based on ensemble of machine learning classifiers.
Results: In ECPred, each EC number constituted an individual class and therefore, had an independent learning model. Enzyme vs. non-enzyme classification is incorporated into ECPred along with a hierarchical prediction approach exploiting the tree structure of the EC nomenclature. ECPred provides predictions for 858 EC numbers in total including 6 main classes, 55 subclass classes, 163 sub-subclass classes and 634 substrate classes. The proposed method is tested and compared with the state-of-the-art enzyme function prediction tools by using independent temporal hold-out and no-Pfam datasets constructed during this study.
Conclusions: ECPred is presented both as a stand-alone and a web based tool to provide probabilistic enzymatic function predictions (at all five levels of EC) for uncharacterized protein sequences. Also, the datasets of this study will be a valuable resource for future benchmarking studies. ECPred is available for download, together with all of the datasets used in this study, at: https://github.com/cansyl/ECPred . ECPred webserver can be accessed through http://cansyl.metu.edu.tr/ECPred.html .
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6150975 | PMC |
http://dx.doi.org/10.1186/s12859-018-2368-y | DOI Listing |
JACC Adv
July 2023
Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden.
Background: An association between impaired exercise capacity and risk of mortality has been reported among adults with congenital heart disease (CHD). Over the years, treatment methods have improved and may influence outcome. Hence, we report data from a national cohort reflecting a contemporary population.
View Article and Find Full Text PDFBioinformatics
October 2023
Univ Rennes, Inria, CNRS, IRISA-UMR 6074, Rennes 35000, France.
Motivation: There is a growing number of available protein sequences, but only a limited amount has been manually annotated. For example, only 0.25% of all entries of UniProtKB are reviewed by human annotators.
View Article and Find Full Text PDFBiochem Eng J
November 2022
Department of Chemical and Biological Engineering, Iowa State University.
Assigning enzyme commission (EC) numbers using sequence information alone has been the subject of recent classification algorithms where statistics, homology and machine-learning based methods are used. This work benchmarks performance of a few of these algorithms as a function of sequence features such as chain length and amino acid composition (AAC). This enables determination of optimal classification windows for sequence generation and enzyme design.
View Article and Find Full Text PDFFront Genet
April 2022
College of Artificial Intelligence, Nankai University, Tianjin, China.
Classification and annotation of enzyme proteins are fundamental for enzyme research on biological metabolism. Enzyme Commission (EC) numbers provide a standard for hierarchical enzyme class prediction, on which several computational methods have been proposed. However, most of these methods are dependent on prior distribution information and none explicitly quantifies amino-acid-level relations and possible contribution of sub-sequences.
View Article and Find Full Text PDFBiochim Biophys Acta Proteins Proteom
January 2022
Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi 110016, India; Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi 110016, India; Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi 110016, India. Electronic address:
Seq2Enz method is a new way to identify whether a query protein sequence is an enzyme and to assign an enzyme class to the protein sequence. The method is based on mask BLAST fortified with some novel structural-chemical properties (NCL) of the building blocks of proteins. All available reviewed enyme sequences (267,276 in number) in Uniprot/SwissProt and most recent depositions (7062) not used for training in ECPred, a state of the art software for enzyme class prediction, are taken for assessment and the results are compared with those from conventional BLAST and ECPred respectively.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!