Predicting novel metabolic pathways through subgraph mining.

Bioinformatics

Initiative for Biological Systems Engineering (IBSE), Interdisciplinary Laboratory for Data Sciences.

Published: December 2017

AI Article Synopsis

  • The ability to predict how new molecules can be synthesized through biochemical transformations is essential for metabolic engineering, but current methods often require detailed knowledge of reaction mechanisms, which can be difficult to gather.
  • A new method using subgraph mining allows for the mapping and prediction of reactions solely based on the chemical structures of reactants and products, enabling the identification of reaction pathways even for unknown molecules.
  • This approach has demonstrated the capability to accurately predict natural biosynthetic pathways and is accessible via a Java implementation at the provided GitHub link.

Article Abstract

Motivation: The ability to predict pathways for biosynthesis of metabolites is very important in metabolic engineering. It is possible to mine the repertoire of biochemical transformations from reaction databases, and apply the knowledge to predict reactions to synthesize new molecules. However, this usually involves a careful understanding of the mechanism and the knowledge of the exact bonds being created and broken. There is a need for a method to rapidly predict reactions for synthesizing new molecules, which relies only on the structures of the molecules, without demanding additional information such as thermodynamics or hand-curated reactant mapping, which are often hard to obtain accurately.

Results: We here describe a robust method based on subgraph mining, to predict a series of biochemical transformations, which can convert between two (even previously unseen) molecules. We first describe a reliable method based on subgraph edit distance to map reactants and products, using only their chemical structures. Having mapped reactants and products, we identify the reaction centre and its neighbourhood, the reaction signature, and store this in a reaction rule network. This novel representation enables us to rapidly predict pathways, even between previously unseen molecules. We demonstrate this ability by predicting pathways to molecules not present in the KEGG database. We also propose a heuristic that predominantly recovers natural biosynthetic pathways from amongst hundreds of possible alternatives, through a directed search of the reaction rule network, enabling us to provide a reliable ranking of the different pathways. Our approach scales well, even to databases with >100 000 reactions.

Availability And Implementation: A Java-based implementation of our algorithms is available at https://github.com/RamanLab/ReactionMiner.

Contact: sayanranu@cse.iitd.ac.in or kraman@iitm.ac.in.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btx481DOI Listing

Publication Analysis

Top Keywords

subgraph mining
8
predict pathways
8
biochemical transformations
8
predict reactions
8
rapidly predict
8
method based
8
based subgraph
8
unseen molecules
8
reactants products
8
reaction rule
8

Similar Publications

Frequent subgraph mining (FSM) is an essential and challenging graph mining task used in several applications of the modern data science. Some of the FSM algorithms have the objective of finding all frequent subgraphs whereas some of the algorithms focus on discovering frequent subgraphs approximately. On the other hand, modern applications employ evolving graphs where the increments are small graphs or stream of nodes and edges.

View Article and Find Full Text PDF

Clique counting is a crucial task in graph mining, as the count of cliques provides different insights across various domains, social and biological network analysis, community detection, recommendation systems, and fraud detection. Counting cliques is algorithmically challenging due to combinatorial explosion, especially for large datasets and larger clique sizes. There are comprehensive surveys and reviews on algorithms for counting subgraphs and triangles (three-clique), but there is a notable lack of reviews addressing k-clique counting algorithms for k > 3.

View Article and Find Full Text PDF

Accurately identifying sites of metabolism (SoM) mediated by cytochrome P450 (CYP) enzymes, which are responsible for drug metabolism in the body, is critical in the early stage of drug discovery and development. Current computational methods for CYP-mediated SoM prediction face several challenges, including limitations to traditional machine learning models at the atomic level, heavy reliance on complex feature engineering, and the lack of interpretability relevant to medicinal chemistry. Here, we propose GraphCySoM, a novel molecule-level modeling approach based on graph neural networks, utilizing lightweight features and interpretable annotations on substructures, to effectively and interpretably predict CYP-mediated SoM.

View Article and Find Full Text PDF

Knowledge graph link prediction is crucial for constructing triples in knowledge graphs, which aim to infer whether there is a relation between the entities. Recently, graph neural networks and contrastive learning have demonstrated superior performance compared with traditional translation-based models; they successfully extracted common features through explicit linking between entities. However, the implicit associations between entities without a linking relationship are ignored, which impedes the model from capturing distant but semantically rich entities.

View Article and Find Full Text PDF

Mining contextually meaningful subgraphs from a vertex-attributed graph.

BMC Bioinformatics

November 2024

Computer Science and Engineering, Qatar University, Doha, Qatar.

Networks have emerged as a natural data structure to represent relations among entities. Proteins interact to carry out cellular functions and protein-Protein interaction network analysis has been employed for understanding the cellular machinery. Advances in genomics technologies enabled the collection of large data that annotate proteins in interaction networks.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!