AI Article Synopsis

  • The process of conducting chemical reactions is complex and relies heavily on years of lab experience or existing protocols.
  • Data-driven approaches like retrosynthetic models are useful but still require expert intervention to translate proposed methods into actual procedures.
  • This study introduces models that predict synthesis steps from chemical equations, utilizing a dataset of over 690,000 equations to achieve over 50% accuracy in producing executable procedures without human input.

Article Abstract

The experimental execution of chemical reactions is a context-dependent and time-consuming process, often solved using the experience collected over multiple decades of laboratory work or searching similar, already executed, experimental protocols. Although data-driven schemes, such as retrosynthetic models, are becoming established technologies in synthetic organic chemistry, the conversion of proposed synthetic routes to experimental procedures remains a burden on the shoulder of domain experts. In this work, we present data-driven models for predicting the entire sequence of synthesis steps starting from a textual representation of a chemical equation, for application in batch organic chemistry. We generated a data set of 693,517 chemical equations and associated action sequences by extracting and processing experimental procedure text from patents, using state-of-the-art natural language models. We used the attained data set to train three different models: a nearest-neighbor model based on recently-introduced reaction fingerprints, and two deep-learning sequence-to-sequence models based on the Transformer and BART architectures. An analysis by a trained chemist revealed that the predicted action sequences are adequate for execution without human intervention in more than 50% of the cases.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8102565PMC
http://dx.doi.org/10.1038/s41467-021-22951-1DOI Listing

Publication Analysis

Top Keywords

experimental procedures
8
chemical reactions
8
organic chemistry
8
data set
8
action sequences
8
models
5
inferring experimental
4
procedures text-based
4
text-based representations
4
chemical
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!