G2GT: Retrosynthesis Prediction with Graph-to-Graph Attention Neural Network and Self-Training.

J Chem Inf Model

Stone Wise, Room 918, Eighth Floor, Building 1, No. 6 Danling Street, Haidian District, Beijing, China 100089.

Published: April 2023

Retrosynthesis prediction, the task of identifying reactant molecules that can be used to synthesize product molecules, is a fundamental challenge in organic chemistry and related fields. To address this challenge, we propose a novel graph-to-graph transformation model, G2GT. The model is built on the standard transformer structure and utilizes graph encoders and decoders. Additionally, we demonstrate the effectiveness of self-training, a data augmentation technique that utilizes unlabeled molecular data, in improving the performance of the model. To further enhance diversity, we propose a weak ensemble method, inspired by reaction-type labels and ensemble learning. This method incorporates beam search, nucleus sampling, and top- sampling to improve inference diversity. A simple ranking algorithm is employed to retrieve the final top-10 results. We achieved new state-of-the-art results on both the USPTO-50K data set, with a top-1 accuracy of 54%, and the larger more challenging USPTO-Full data set, with a top-1 accuracy of 49.3% and competitive top-10 results. Our model can also be generalized to all other graph-to-graph transformation tasks. Data and code are available at https://github.com/Anonnoname/G2GT_2.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.2c01302DOI Listing

Publication Analysis

Top Keywords

retrosynthesis prediction
8
graph-to-graph transformation
8
data set
8
set top-1
8
top-1 accuracy
8
data
5
g2gt retrosynthesis
4
prediction graph-to-graph
4
graph-to-graph attention
4
attention neural
4

Similar Publications

Simple User-Friendly Reaction Format.

Mol Inform

January 2025

Department of Biosystems Science and Engineering, ETH Zurich, Klingelbergstrasse 48, 4056, Basel, Switzerland.

Utilizing the growing wealth of chemical reaction data can boost synthesis planning and increase success rates. Yet, the effectiveness of machine learning tools for retrosynthesis planning and forward reaction prediction relies on accessible, well-curated data presented in a structured format. Although some public and licensed reaction databases exist, they often lack essential information about reaction conditions.

View Article and Find Full Text PDF

Inferring appropriate synthesis reaction (i.e., retrosynthesis) routes for newly designed molecules is vital.

View Article and Find Full Text PDF

CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions.

J Cheminform

January 2025

Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Key Laboratory of Quantitative Synthetic Biology, Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.

Predicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (Contrastive Learning-based AnnotatIon for Reaction's EC), a novel framework leveraging contrastive learning, pre-trained language model-based reaction embeddings, and data augmentation to address these limitations.

View Article and Find Full Text PDF

Deep generative models have garnered significant attention for their efficiency in drug discovery, yet the synthesis of proposed molecules remains a challenge. Retrosynthetic planning, a part of computer-assisted synthesis planning, addresses this challenge by recursively decomposing molecules using symbolic rules and machine-trained scoring functions. However, current methods often treat each molecule independently, missing the opportunity to utilize shared synthesis patterns and repeat pathways, which may contribute from known synthesis routes to newly emerging, similar molecules, a notable challenge with AI-generated small molecules.

View Article and Find Full Text PDF

Chemoenzymatic Synthesis Planning Guided by Reaction Type Score.

J Chem Inf Model

December 2024

NSF Molecule Maker Lab Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.

Thanks to the growing interest in computer-aided synthesis planning (CASP), a wide variety of retrosynthesis and retrobiosynthesis tools have been developed in the past decades. However, synthesis planning tools for multistep chemoenzymatic reactions are still rare despite the widespread use of enzymatic reactions in chemical synthesis. Herein, we report a reaction type score (RTscore)-guided chemoenzymatic synthesis planning (RTS-CESP) strategy.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!