Recent years have seen revived interest in computer-assisted organic synthesis. The use of reaction- and neural-network algorithms that can plan multistep synthetic pathways have revolutionized this field, including examples leading to advanced natural products. Such methods typically operate on full, literature-derived 'substrate(s)-to-product' reaction rules and cannot be easily extended to the analysis of reaction mechanisms.
View Article and Find Full Text PDFMulti-metal oxides in general and perovskite oxides in particular have attracted considerable attention as oxygen evolution electrocatalysts. Although numerous theoretical studies have been undertaken, the most promising perovskite-based catalysts continue to emerge from human-driven experimental campaigns rather than data-driven machine learning protocols, which are often limited by the scarcity of experimental data on which to train the models. This work promises to break this impasse by demonstrating that active learning on even small datasets-but supplemented by informative structural-characterization data and coupled with closed-loop experimentation-can yield materials of outstanding performance.
View Article and Find Full Text PDFGeneral conditions for organic reactions are important but rare, and efforts to identify them usually consider only narrow regions of chemical space. Discovering more general reaction conditions requires considering vast regions of chemical space derived from a large matrix of substrates crossed with a high-dimensional matrix of reaction conditions, rendering exhaustive experimentation impractical. Here, we report a simple closed-loop workflow that leverages data-guided matrix down-selection, uncertainty-minimizing machine learning, and robotic experimentation to discover general reaction conditions.
View Article and Find Full Text PDFApplications of machine learning (ML) to synthetic chemistry rely on the assumption that large numbers of literature-reported examples should enable construction of accurate and predictive models of chemical reactivity. This paper demonstrates that abundance of carefully curated literature data may be insufficient for this purpose. Using an example of Suzuki-Miyaura coupling with heterocyclic building blocks─and a carefully selected database of >10,000 literature examples─we show that ML models cannot offer any meaningful predictions of optimum reaction conditions, even if the search space is restricted to only solvents and bases.
View Article and Find Full Text PDFThis work describes a method to vectorize and Machine-Learn, ML, non-covalent interactions responsible for scaffold-directed reactions important in synthetic chemistry. Models trained on this representation predict correct face of approach in ca. 90 % of Michael additions or Diels-Alder cycloadditions.
View Article and Find Full Text PDFWhen an organometallic catalyst is tethered onto a nanoparticle and is embedded in a monolayer of longer ligands terminated in "gating" end-groups, these groups can control the access and orientation of the incoming substrates. In this way, a nonspecific catalyst can become enzyme-like: it can select only certain substrates from substrate mixtures and, quite remarkably, can also preorganize these substrates such that only some of their otherwise equivalent sites react. For a simple, copper-based click reaction catalyst and for gating ligands terminated in charged groups, both substrate- and site-selectivities are on the order of 100, which is all the more notable given the relative simplicity of the on-particle monolayers compared to the intricacy of enzymes' active sites.
View Article and Find Full Text PDFTraining algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years. However, the field has progressed greatly since the development of early programs such as LHASA, for which reaction choices at each step were made by human operators. Multiple software platforms are now capable of completely autonomous planning.
View Article and Find Full Text PDFA computer program for retrosynthetic planning helps develop multiple "synthetic contingency" plans for hydroxychloroquine and also routes leading to remdesivir, both promising but yet unproven medications against COVID-19. These plans are designed to navigate, as much as possible, around known and patented routes and to commence from inexpensive and diverse starting materials, so as to ensure supply in case of anticipated market shortages of commonly used substrates. Looking beyond the current COVID-19 pandemic, development of similar contingency syntheses is advocated for other already-approved medications, in case such medications become urgently needed in mass quantities to face other public-health emergencies.
View Article and Find Full Text PDFThe challenge of prebiotic chemistry is to trace the syntheses of life's key building blocks from a handful of primordial substrates. Here we report a forward-synthesis algorithm that generates a full network of prebiotic chemical reactions accessible from these substrates under generally accepted conditions. This network contains both reported and previously unidentified routes to biotic targets, as well as plausible syntheses of abiotic molecules.
View Article and Find Full Text PDFCurrently developed protocols of theozyme design still lead to biocatalysts with much lower catalytic activity than enzymes existing in nature, and, so far, the only avenue of improvement was the in vitro laboratory-directed evolution (LDE) experiments. In this paper, we propose a different strategy based on "reversed" methodology of mutation prediction. Instead of common "top-down" approach, requiring numerous assumptions and vast computational effort, we argue for a "bottom-up" approach that is based on the catalytic fields derived directly from transition state and reactant complex wave functions.
View Article and Find Full Text PDFThe ability to estimate the acidity of C-H groups within organic molecules in non-aqueous solvents is important in synthetic planning to correctly predict which protons will be abstracted in reactions such as alkylations, Michael additions, or aldol condensations. This Article describes the use of the so-called graph convolutional neural networks (GCNNs) to perform such predictions on the time scales of milliseconds and with accuracy comparing favorably with state-of-the-art solutions, including commercial ones. The crux of the method is to train GCNNs using descriptors that reflect not only topological but also chemical properties of atomic environments.
View Article and Find Full Text PDFMachine learning can predict the major regio-, site-, and diastereoselective outcomes of Diels-Alder reactions better than standard quantum-mechanical methods and with accuracies exceeding 90 % provided that i) the diene/dienophile substrates are represented by "physical-organic" descriptors reflecting the electronic and steric characteristics of their substituents and ii) the positions of such substituents relative to the reaction core are encoded ("vectorized") in an informative way.
View Article and Find Full Text PDFCatalytic fields illustrate topology of the optimal charge distribution of a molecular environment reducing the activation energy for any process involving barrier crossing, like chemical reaction, bond rotation etc. Until now, this technique has been successfully applied to predict catalytic effects resulting from intermolecular interactions with individual water molecules constituting the first hydration shell, aminoacid mutations in enzymes or Si→Al substitutions in zeolites. In this contribution, hydrogen to fluorine (H→F) substitution effects for two model reactions have been examined indicating qualitative applicability of the catalytic field concept in the case of systems involving intramolecular interactions.
View Article and Find Full Text PDFWe propose a simple atomic multipole electrostatic model to rapidly evaluate the effects of mutation on enzyme activity and test its performance on wild-type and mutant ketosteroid isomerase. The predictions of our atomic multipole model are similar to those obtained with symmetry-adapted perturbation theory at a fraction of the computational cost. We further show that this approach is relatively insensitive to the precise amino acid side chain conformation in mutants and may thus be useful in computational enzyme (re)design.
View Article and Find Full Text PDFFatty acid amide hydrolase (FAAH) is an enzyme responsible for the deactivating hydrolysis of fatty acid ethanolamide neuromodulators. FAAH inhibitors have gained considerable interest due to their possible application in the treatment of anxiety, inflammation, and pain. In the context of inhibitor design, the availability of reliable computational tools for predicting binding affinity is still a challenging task, and it is now well understood that empirical scoring functions have several limitations that in principle could be overcome by quantum mechanics.
View Article and Find Full Text PDFThe relative stability of biologically relevant, hydrogen bonded complexes with shortened distances can be assessed at low cost by the electrostatic multipole term alone more successfully than by ab initio methods. These results imply that atomic multipole moments may help improve ligand-receptor ranking predictions, particularly in cases where accurate structural data are not available.
View Article and Find Full Text PDFThe concept of the polarization justified Fukui functions has been tested for the set of model molecules: imidazole, oxazole, and thiazole. Calculations of the Fukui functions have been based on the molecular polarizability analysis, which makes them a potentially more sensitive analytical tool as compared to the classical density functional theory proposals, typically built on electron density only. Three selected molecules show distinct differences in their reactivity patterns, despite very close geometry and electronic structure.
View Article and Find Full Text PDF