From sequence to enzyme mechanism using multi-label machine learning.

BMC Bioinformatics

Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland KY16 9ST, UK.

Published: May 2014

AI Article Synopsis

  • The study enhances enzyme function prediction by detailing not just the potential reactions an enzyme can perform, but also the specific mechanisms, cofactors, and susceptibility to drugs or inhibitors.
  • It achieves a high accuracy of 96% in predicting enzyme mechanisms using detailed sequence signatures and a K-Nearest Neighbours algorithm, drawing from several enzyme databases.
  • The research highlights that InterPro signatures play a crucial role in improving prediction accuracy, while adding Catalytic Site Atlas features does not significantly enhance results.

Article Abstract

Background: In this work we predict enzyme function at the level of chemical mechanism, providing a finer granularity of annotation than traditional Enzyme Commission (EC) classes. Hence we can predict not only whether a putative enzyme in a newly sequenced organism has the potential to perform a certain reaction, but how the reaction is performed, using which cofactors and with susceptibility to which drugs or inhibitors, details with important consequences for drug and enzyme design. Work that predicts enzyme catalytic activity based on 3D protein structure features limits the prediction of mechanism to proteins already having either a solved structure or a close relative suitable for homology modelling.

Results: In this study, we evaluate whether sequence identity, InterPro or Catalytic Site Atlas sequence signatures provide enough information for bulk prediction of enzyme mechanism. By splitting MACiE (Mechanism, Annotation and Classification in Enzymes database) mechanism labels to a finer granularity, which includes the role of the protein chain in the overall enzyme complex, the method can predict at 96% accuracy (and 96% micro-averaged precision, 99.9% macro-averaged recall) the MACiE mechanism definitions of 248 proteins available in the MACiE, EzCatDb (Database of Enzyme Catalytic Mechanisms) and SFLD (Structure Function Linkage Database) databases using an off-the-shelf K-Nearest Neighbours multi-label algorithm.

Conclusion: We find that InterPro signatures are critical for accurate prediction of enzyme mechanism. We also find that incorporating Catalytic Site Atlas attributes does not seem to provide additional accuracy. The software code (ml2db), data and results are available online at http://sourceforge.net/projects/ml2db/ and as supplementary files.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4229970PMC
http://dx.doi.org/10.1186/1471-2105-15-150DOI Listing

Publication Analysis

Top Keywords

enzyme mechanism
12
enzyme
9
mechanism
8
finer granularity
8
enzyme catalytic
8
catalytic site
8
site atlas
8
prediction enzyme
8
macie mechanism
8
sequence enzyme
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!