Machine learning (ML) systems have enabled the modelling of quantitative structure-property relationships (QSPR) and structure-activity relationships (QSAR) using existing experimental data to predict target properties for new molecules. These property predictors hold significant potential in accelerating drug discovery by guiding generative artificial intelligence (AI) agents to explore desired chemical spaces. However, they often struggle to generalize due to the limited scope of the training data.
View Article and Find Full Text PDFHow many near-neighbors does a molecule have? This fundamental question in chemistry is crucial for molecular optimization problems under the similarity principle assumption. Generative models can sample molecules from a vast chemical space but lack explicit knowledge about molecular similarity. Therefore, these models need guidance from reinforcement learning to sample a relevant similar chemical space.
View Article and Find Full Text PDFDesigning compounds with a range of desirable properties is a fundamental challenge in drug discovery. In pre-clinical early drug discovery, novel compounds are often designed based on an already existing promising starting compound through structural modifications for further property optimization. Recently, transformer-based deep learning models have been explored for the task of molecular optimization by training on pairs of similar molecules.
View Article and Find Full Text PDFIn the pursuit of improved compound identification and database search tasks, this study explores heteronuclear single quantum coherence (HSQC) spectra simulation and matching methodologies. HSQC spectra serve as unique molecular fingerprints, enabling a valuable balance of data collection time and information richness. We conducted a comprehensive evaluation of the following four HSQC simulation techniques: ACD/Labs (ACD), MestReNova (MNova), Gaussian NMR calculations (DFT), and a graph-based neural network (ML).
View Article and Find Full Text PDFReinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties.
View Article and Find Full Text PDFWe investigate the potential of graph neural networks for transfer learning and improving molecular property prediction on sparse and expensive to acquire high-fidelity data by leveraging low-fidelity measurements as an inexpensive proxy for a targeted property of interest. This problem arises in discovery processes that rely on screening funnels for trading off the overall costs against throughput and accuracy. Typically, individual stages in these processes are loosely connected and each one generates data at different scale and fidelity.
View Article and Find Full Text PDFREINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning.
View Article and Find Full Text PDFAtom-centred neural networks represent the state-of-the-art for approximating the quantum chemical properties of molecules, such as internal energies. While the design of machine learning architectures that respect chemical principles has continued to advance, the final atom pooling operation that is necessary to convert from atomic to molecular representations in most models remains relatively undeveloped. The most common choices, sum and average pooling, compute molecular representations that are naturally a good fit for many physical properties, while satisfying properties such as permutation invariance which are desirable from a geometric deep learning perspective.
View Article and Find Full Text PDFUnderstanding allosteric regulation in biomolecules is of great interest to pharmaceutical research and computational methods emerged during the last decades to characterize allosteric coupling. However, the prediction of allosteric sites in a protein structure remains a challenging task. Here, we integrate local binding site information, coevolutionary information, and information on dynamic allostery into a structure-based three-parameter model to identify potentially hidden allosteric sites in ensembles of protein structures with orthosteric ligands.
View Article and Find Full Text PDFIn drug discovery, computational methods are a key part of making informed design decisions and prioritising experiments. In particular, optimizing compound affinity is a central concern during the early stages of development. In the last 10 years, alchemical free energy (FE) calculations have transformed our ability to incorporate accurate in silico potency predictions in design decisions, and represent the 'gold standard' for augmenting experiment-driven drug discovery.
View Article and Find Full Text PDFHigh-throughput screening (HTS), as one of the key techniques in drug discovery, is frequently used to identify promising drug candidates in a largely automated and cost-effective way. One of the necessary conditions for successful HTS campaigns is a large and diverse compound library, enabling hundreds of thousands of activity measurements per project. Such collections of data hold great promise for computational and experimental drug discovery efforts, especially when leveraged in combination with modern deep learning techniques, and can potentially lead to improved drug activity predictions and cheaper and more effective experimental design.
View Article and Find Full Text PDFCurr Opin Struct Biol
June 2023
In this mini review, we capture the latest progress of applying artificial intelligence (AI) techniques based on deep learning architectures to molecular de novo design with a focus on integration with experimental validation. We will cover the progress and experimental validation of novel generative algorithms, the validation of QSAR models and how AI-based molecular de novo design is starting to become connected with chemistry automation. While progress has been made in the last few years, it is still early days.
View Article and Find Full Text PDFStrategies for machine-learning (ML)-accelerated discovery that are general across material composition spaces are essential, but demonstrations of ML have been primarily limited to narrow composition variations. By addressing the scarcity of data in promising regions of chemical space for challenging targets such as open-shell transition-metal complexes, general representations and transferable ML models that leverage known relationships in existing data will accelerate discovery. Over a large set (∼1000) of isovalent transition-metal complexes, we quantify evident relationships for different properties (i.
View Article and Find Full Text PDFRecently, we have released the de novo design platform REINVENT in version 2.0. This improved and extended iteration supports far more features and scoring function components, which allows bespoke and tailor-made protocols to maximize impact in small molecule drug discovery projects.
View Article and Find Full Text PDFWe have demonstrated the utility of a 3D shape and pharmacophore similarity scoring component in molecular design with a deep generative model trained with reinforcement learning. Using Dopamine receptor type 2 (DRD2) as an example and its antagonist haloperidol 1 as a starting point in a ligand based design context, we have shown in a retrospective study that a 3D similarity enabled generative model can discover new leads in the absence of any other information. It can be efficiently used for scaffold hopping and generation of novel series.
View Article and Find Full Text PDFThe variability of chemical bonding in open-shell transition-metal complexes not only motivates their study as functional materials and catalysts but also challenges conventional computational modeling tools. Here, tailoring ligand chemistry can alter preferred spin or oxidation states as well as electronic structure properties and reactivity, creating vast regions of chemical space to explore when designing new materials atom by atom. Although first-principles density functional theory (DFT) remains the workhorse of computational chemistry in mechanism deduction and property prediction, it is of limited use here.
View Article and Find Full Text PDFMillions of distinct metal-organic frameworks (MOFs) can be made by combining metal nodes and organic linkers. At present, over 90,000 MOFs have been synthesized and over 500,000 predicted. This raises the question whether a new experimental or predicted structure adds new information.
View Article and Find Full Text PDFThe accelerated discovery of materials for real world applications requires the achievement of multiple design objectives. The multidimensional nature of the search necessitates exploration of multimillion compound libraries over which even density functional theory (DFT) screening is intractable. Machine learning (e.
View Article and Find Full Text PDFDetermination of ground-state spins of open-shell transition-metal complexes is critical to understanding catalytic and materials properties but also challenging with approximate electronic structure methods. As an alternative approach, we demonstrate how structure alone can be used to guide assignment of ground-state spin from experimentally determined crystal structures of transition-metal complexes. We first identify the limits of distance-based heuristics from distributions of metal-ligand bond lengths of over 2000 unique mononuclear Fe(II)/Fe(III) transition-metal complexes.
View Article and Find Full Text PDFMachine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (, ensemble models) or rely on feature engineering (, feature space distances), and each has limitations in estimating prediction errors for chemical space exploration.
View Article and Find Full Text PDFHigh-throughput computational screening for chemical discovery mandates the automated and unsupervised simulation of thousands of new molecules and materials. In challenging materials spaces, such as open shell transition metal chemistry, characterization requires time-consuming first-principles simulation that often necessitates human intervention. These calculations can frequently lead to a null result, e.
View Article and Find Full Text PDFRecent transformative advances in computing power and algorithms have made computational chemistry central to the discovery and design of new molecules and materials. First-principles simulations are increasingly accurate and applicable to large systems with the speed needed for high-throughput computational screening. Despite these strides, the combinatorial challenges associated with the vastness of chemical space mean that more than just fast and accurate computational tools are needed for accelerated chemical discovery.
View Article and Find Full Text PDF