The current generation of large language models (LLMs) has limited chemical knowledge. Recently, it has been shown that these LLMs can learn and predict chemical properties through fine-tuning. Using natural language to train machine learning models opens doors to a wider chemical audience, as field-specific featurization techniques can be omitted.
View Article and Find Full Text PDFChiral cyclopentadienyl (Cp) metal complexes are frequently used in asymmetric catalysis by virtue of their high reactivity and selectivity. Planar-chiral-only rhodium and iridium cyclopentadienyl complexes are particularly promising due to unrestricted chemical space for Cp ligand design while retaining structural simplicity. However, they are currently still niche because of a lack of efficient synthetic strategies that avoid lengthy chiral auxiliary routes or chiral preparatory HPLC resolution of the complexes.
View Article and Find Full Text PDFBayesian optimization (BO) is an efficient method for solving complex optimization problems, including those in chemical research, where it is gaining significant popularity. Although effective in guiding experimental design, BO does not account for experimentation costs: testing readily available reagents under different conditions could be more cost and time-effective than synthesizing or buying additional ones. To address this issue, we present cost-informed BO (CIBO), an approach tailored for the rational planning of chemical experimentation that prioritizes the most cost-effective experiments.
View Article and Find Full Text PDFSinglet fission has shown potential for boosting the efficiency of solar cells, but the scarcity of suitable molecular materials hinders its implementation. We introduce an uncertainty-controlled genetic algorithm (ucGA) based on ensemble machine learning predictions from different molecular representations that concurrently optimizes excited state energies, synthesizability, and exciton size for the discovery of singlet fission materials. The ucGA allows us to efficiently explore the chemical space spanned by the reFORMED fragment database, which consists of 45,000 cores and 5,000 substituents derived from crystallographic structures assembled in the FORMED repository.
View Article and Find Full Text PDFExploiting crystallographic data repositories for large-scale quantum chemical computations requires the rapid and accurate extraction of the molecular structure, charge and spin from the crystallographic information file. Here, we develop a general approach to assign the ground state spin of transition metal complexes, in complement to our previous efforts on determining metal oxidation states and bond order within the software. Starting from a database of 31k transition metal complexes extracted from the Cambridge Structural Database with , we construct the TM-GSspin dataset, which contains 2063 mononuclear first row transition metal complexes and their computed ground state spins.
View Article and Find Full Text PDFGeometric deep learning models, which incorporate the relevant molecular symmetries within the neural network architecture, have considerably improved the accuracy and data efficiency of predictions of molecular properties. Building on this success, we introduce 3DReact, a geometric deep learning model to predict reaction properties from three-dimensional structures of reactants and products. We demonstrate that the invariant version of the model is sufficient for existing reaction data sets.
View Article and Find Full Text PDFThe prediction of reaction selectivity is a challenging task for computational chemistry, not only because many molecules adopt multiple conformations but also due to the exponential relationship between effective activation energies and rate constants. To account for molecular flexibility, an increasing number of methods exist that generate conformational ensembles of transition state (TS) structures. Typically, these TS ensembles are Boltzmann weighted and used to compute selectivity assuming Curtin-Hammett conditions.
View Article and Find Full Text PDFMolecular volcano plots, which facilitate the rapid prediction of the activity and selectivity of prospective catalysts, have emerged as powerful tools for computational catalysis. Here, we integrate microkinetic modeling into the volcano plot framework to develop "microkinetic molecular volcano plots". The resulting unified computational framework allows the influence of important reaction parameters, including temperature, reaction time, and concentration, to be quickly incorporated and more complex situations, such as off-cycle resting states and coupled catalytic cycles, to be tackled.
View Article and Find Full Text PDFChiral ligands are important components in asymmetric homogeneous catalysis, but their synthesis and screening can be both time-consuming and resource-intensive. Data-driven approaches, in contrast to screening procedures based on intuition, have the potential to reduce the time and resources needed for reaction optimization by more rapidly identifying an ideal catalyst. These approaches, however, are often nontransferable and cannot be applied across different reactions.
View Article and Find Full Text PDFFrustrated Lewis pairs (FLPs), featuring reactive combinations of Lewis acids and Lewis bases, have been utilized for myriad metal-free homogeneous catalytic processes. Immobilizing the active Lewis sites to a solid support, especially to porous scaffolds, has shown great potential to ameliorate FLP catalysis by circumventing some of its inherent drawbacks, such as poor product separation and catalyst recyclability. Nevertheless, designing immobilized Lewis pair active sites (LPASs) is challenging due to the requirement of placing the donor and acceptor centers in appropriate geometric arrangements while maintaining the necessary chemical environment to perform catalysis, and clear design rules have not yet been established.
View Article and Find Full Text PDFA catalyst possessing a broad substrate scope, in terms of both turnover and enantioselectivity, is sometimes called "general". Despite their great utility in asymmetric synthesis, truly general catalysts are difficult or expensive to discover traditional high-throughput screening and are, therefore, rare. Existing computational tools accelerate the evaluation of reaction conditions from a pre-defined set of experiments to identify the most general ones, but cannot generate entirely new catalysts with enhanced substrate breadth.
View Article and Find Full Text PDFIn this account, we discuss the use of genetic algorithms in the inverse design process of homogeneous catalysts for chemical transformations. We describe the main components of evolutionary experiments, specifically the nature of the fitness function to optimize, the library of molecular fragments from which potential catalysts are assembled, and the settings of the genetic algorithm itself. While not exhaustive, this review summarizes the key challenges and characteristics of our own (i.
View Article and Find Full Text PDFIn this minireview, we overview a computational pipeline developed within the framework of NCCR Catalysis that can be used to successfully reproduce the enantiomeric ratios of homogeneous catalytic reactions. At the core of this pipeline is the SCINE Molassembler module, a graph-based software that provides algorithms for molecular construction of all periodic table elements. With this pipeline, we are able to simultaneously functionalizenand generate ensembles of transition state conformers, which permits facile exploration of the influencenof various substituents on the overall enantiomeric ratio.
View Article and Find Full Text PDFThe high-throughput exploration and screening of molecules for organic electronics involves either a 'top-down' curation and mining of existing repositories, or a 'bottom-up' assembly of user-defined fragments based on known synthetic templates. Both are time-consuming approaches requiring significant resources to compute electronic properties accurately. Here, 'top-down' is combined with 'bottom-up' through automatic assembly and statistical models, thus providing a platform for the fragment-based discovery of organic electronic materials.
View Article and Find Full Text PDFIt is well-known that the activity and function of proteins is strictly correlated with their secondary, tertiary, and quaternary structures. Their biological role is regulated by their conformational flexibility and global fold, which, in turn, is largely governed by complex noncovalent interaction networks. Because of the large size of proteins, the analysis of their noncovalent interaction networks is challenging, but can provide insights into the energetics of conformational changes or protein-protein and protein-ligand interactions.
View Article and Find Full Text PDFThe noncovalent interaction (NCI) index is nowadays a well-known strategy to detect NCIs in molecular systems. Even though it initially provided only qualitative descriptions, the technique has been recently extended to also extract quantitative information. To accomplish this task, integrals of powers of the electron distribution were considered, with the requirement that the overall electron density can be clearly decomposed as sum of distinct fragment contributions to enable the definition of the (noncovalent) integration region.
View Article and Find Full Text PDFThe automated construction of datasets has become increasingly relevant in computational chemistry. While transition-metal catalysis has greatly benefitted from bottom-up or top-down strategies for the curation of organometallic complexes libraries, the field of organocatalysis is mostly dominated by case-by-case studies, with a lack of transferable data-driven tools that facilitate both the exploration of a wider range of catalyst space and the optimization of reaction properties. For these reasons, we introduce OSCAR, a repository of 4000 experimentally derived organocatalysts along with their corresponding building blocks and combinatorially enriched structures.
View Article and Find Full Text PDFPhys Chem Chem Phys
November 2022
The allene radical cation can be stabilized both by Jahn-Teller distortion of the bond lengths and by torsion of the end-groups. However, only the latter happens and the allene radical cation relaxes into a twisted symmetry structure with equal double-bond lengths. Here we revisit the Jahn-Teller distortion of allene and spiropentadiene by assessing the possible implications of their helical π-systems in the radical cations.
View Article and Find Full Text PDFWe provide a comprehensive overview of the chemical information from electron density: not only how to extract information, but also how to obtain and how to assess the quality of the electron density itself. After introducing several indexes derived from electron density, which allow bonding to be revealed, we focus on the various potential sources of electron density, and also explain the error trends they show so that a judicious choice of methods and limitations are clearly laid on the table. Computational, experimental-computational combinations, and machine learning efforts are covered in this work.
View Article and Find Full Text PDFVolcano plots and activity maps are powerful tools for studying homogeneous catalysis. Once constructed, they can be used to estimate and predict the performance of a catalyst from one or more descriptor variables. The relevance and utility of these tools has been demonstrated in several areas of catalysis, with recent applications to homogeneous catalysts having been pioneered by our research group.
View Article and Find Full Text PDFThe computation of reaction selectivity represents an appealing complementary route to experimental studies and a powerful means to refine catalyst design strategies. Accurately establishing the selectivity of reactions facilitated by molecular catalysts, however, remains a challenging task for computational chemistry. The small free energy differences that lead to large variations in the enantiomeric ratio () represent particularly tricky quantities to predict with sufficient accuracy to be helpful for prioritizing experiments.
View Article and Find Full Text PDFNon-covalent bonding patterns are commonly harvested as a design principle in the field of catalysis, supramolecular chemistry, and functional materials to name a few. Yet, their computational description generally neglects finite temperature and environment effects, which promote competing interactions and alter their static gas-phase properties. Recently, neural network potentials (NNPs) trained on density functional theory (DFT) data have become increasingly popular to simulate molecular phenomena in condensed phase with an accuracy comparable to ab initio methods.
View Article and Find Full Text PDFThe immobilization of molecular catalysts imposes spatial constraints on their active site. We reveal that in bifunctional catalysis such constraints can also be utilized as an appealing handle to boost intrinsic activity through judicious control of the active site geometry. To demonstrate this, we develop a pragmatic approach, based on nonlinear scaling relationships, to map the spatial arrangements of the acid-base components of frustrated Lewis pairs (FLPs) to their performance in the catalytic hydrogenation of CO .
View Article and Find Full Text PDFHundreds of catalytic methods are developed each year to meet the demand for high-purity chiral compounds. The computational design of enantioselective organocatalysts remains a significant challenge, as catalysts are typically discovered through experimental screening. Recent advances in combining quantum chemical computations and machine learning (ML) hold great potential to propel the next leap forward in asymmetric catalysis.
View Article and Find Full Text PDFThe human apoptosis-inducing factor (hAIF) is a moonlight flavoprotein involved in mitochondrial respiratory complex assembly and caspase-independent programmed cell death. These functions might be modulated by its redox-linked structural transition that enables hAIF to act as a NAD(H/) redox sensor. Upon reduction with NADH, hAIF undergoes a conformational reorganization in two specific insertions-the flexible regulatory C-loop and the 190-202 -harpin-promoting protein dimerization and the stabilization of a long-life charge transfer complex (CTC) that modulates its monomer-dimer equilibrium and its protein interaction network in healthy mitochondria.
View Article and Find Full Text PDF