Metagenomics can provide insight into the microbial taxa present in a sample and, through gene identification, the functional potential of the community. However, taxonomic and functional information are typically considered separately in downstream analyses. We develop interpretable machine learning (ML) approaches for modelling metagenomic data, combining the biological representation of species with their associated genetically encoded functions within models.
View Article and Find Full Text PDFConjugated organic photoredox catalysts (OPCs) can promote a wide range of chemical transformations. It is challenging to predict the catalytic activities of OPCs from first principles, either by expert knowledge or by using a priori calculations, as catalyst activity depends on a complex range of interrelated properties. Organic photocatalysts and other catalyst systems have often been discovered by a mixture of design and trial and error.
View Article and Find Full Text PDFCurr Opin Struct Biol
August 2024
Biomolecular simulation can act as both a digital microscope and a crystal ball; offering the potential for a deeper understanding of experimental observations whilst also presenting a forward-looking avenue for the in silico design and evaluation of hitherto unsynthesized compounds. Indeed, as the intricacy of our scientific inquiries has grown, so too has the computational prowess we seek to deploy in our pursuit of answers. As we enter the Exascale era, this mini-review surveys the computational landscape from both the point of view of the development of new and ever more powerful systems, and the simulations that are run on them.
View Article and Find Full Text PDFAngew Chem Int Ed Engl
January 2023
The optimization of multistep chemical syntheses is critical for the rapid development of new pharmaceuticals. However, concatenating individually optimized reactions can lead to inefficient multistep syntheses, owing to chemical interdependencies between the steps. Herein, we develop an automated continuous flow platform for the simultaneous optimization of telescoped reactions.
View Article and Find Full Text PDFIn molecular discovery and drug design, structure-property relationships and activity landscapes are often qualitatively or quantitatively analyzed to guide the navigation of chemical space. The roughness (or smoothness) of these molecular property landscapes is one of their most studied geometric attributes, as it can characterize the presence of activity cliffs, with rougher landscapes generally expected to pose tougher optimization challenges. Here, we introduce a general, quantitative measure for describing the roughness of molecular property landscapes.
View Article and Find Full Text PDFHigh-throughput virtual screening is an indispensable technique utilized in the discovery of small molecules. In cases where the library of molecules is exceedingly large, the cost of an exhaustive virtual screen may be prohibitive. Model-guided optimization has been employed to lower these costs through dramatic increases in sample efficiency compared to random selection.
View Article and Find Full Text PDFInflammatory bowel diseases (IBDs), including ulcerative colitis and Crohn's disease, affect several million individuals worldwide. These diseases are heterogeneous at the clinical, immunological and genetic levels and result from complex host and environmental interactions. Investigating drug efficacy for IBD can improve our understanding of why treatment response can vary between patients.
View Article and Find Full Text PDFWhile energy-structure-function (ESF) maps are a powerful new tool for in silico materials design, the cost of acquiring an ESF map for many properties is too high for routine integration into high-throughput virtual screening workflows. Here, we propose the next evolution of the ESF map. This uses parallel Bayesian optimization to selectively acquire energy and property data, generating the same levels of insight at a fraction of the computational cost.
View Article and Find Full Text PDFThe circadian clock is an important adaptation to life on Earth. Here, we use machine learning to predict complex, temporal, and circadian gene expression patterns in Most significantly, we classify circadian genes using DNA sequence features generated de novo from public, genomic resources, facilitating downstream application of our methods with no experimental work or prior knowledge needed. We use local model explanation that is transcript specific to rank DNA sequence features, providing a detailed profile of the potential circadian regulatory mechanisms for each transcript.
View Article and Find Full Text PDFAlterations in the human microbiome have been observed in a variety of conditions such as asthma, gingivitis, dermatitis and cancer, and much remains to be learned about the links between the microbiome and human health. The fusion of artificial intelligence with rich microbiome datasets can offer an improved understanding of the microbiome's role in human health. To gain actionable insights it is essential to consider both the predictive power and the transparency of the models by providing explanations for the predictions.
View Article and Find Full Text PDFDuring the development of new drugs or compounds there is a requirement for preclinical trials, commonly involving animal tests, to ascertain the safety of the compound prior to human trials. Machine learning techniques could provide an in-silico alternative to animal models for assessing drug toxicity, thus reducing expensive and invasive animal testing during clinical trials, for drugs that are most likely to fail safety tests. Here we present a machine learning model to predict kidney dysfunction, as a proxy for drug induced renal toxicity, in rats.
View Article and Find Full Text PDFChemical representations derived from deep learning are emerging as a powerful tool in areas such as drug discovery and materials innovation. Currently, this methodology has three major limitations - the cost of representation generation, risk of inherited bias, and the requirement for large amounts of data. We propose the use of multi-task learning in tandem with transfer learning to address these limitations directly.
View Article and Find Full Text PDFWe present a machine learning approach to automated force field development in dissipative particle dynamics (DPD). The approach employs Bayesian optimization to parametrize a DPD force field against experimentally determined partition coefficients. The optimization process covers a discrete space of over 40 000 000 points, where each point represents the set of potentials that jointly forms a force field.
View Article and Find Full Text PDFBackground: The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for tyrhe compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra.
View Article and Find Full Text PDFSimulation and data analysis have evolved into powerful methods for discovering and understanding molecular modes of action and designing new compounds to exploit these modes. The combination provides a strong impetus to create and exploit new tools and techniques at the interfaces between physics, biology, and data science as a pathway to new scientific insight and accelerated discovery. In this context, we explore the rational design of novel antimicrobial peptides (short protein sequences exhibiting broad activity against multiple species of bacteria).
View Article and Find Full Text PDFThe Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.
View Article and Find Full Text PDFActa Crystallogr B Struct Sci Cryst Eng Mater
August 2016
We present a re-parameterization of a popular intermolecular force field for describing intermolecular interactions in the organic solid state. Specifically we optimize the performance of the exp-6 force field when used in conjunction with atomic multipole electrostatics. We also parameterize force fields that are optimized for use with multipoles derived from polarized molecular electron densities, to account for induction effects in molecular crystals.
View Article and Find Full Text PDFSmall structural changes in organic molecules can have a large influence on solid-state crystal packing, and this often thwarts attempts to produce isostructural series of crystalline solids. For metal-organic frameworks and covalent organic frameworks, this has been addressed by using strong, directional intermolecular bonding to create families of isoreticular solids. Here, we show that an organic directing solvent, 1,4-dioxane, has a dominant effect on the lattice energy for a series of organic cage molecules.
View Article and Find Full Text PDFWe synthesize a series of imine cage molecules where increasing the chain length of the alkanediamine precursor results in an odd-even alternation between [2 + 3] and [4 + 6] cage macrocycles. A computational procedure is developed to predict the thermodynamically preferred product and the lowest energy conformer, hence rationalizing the observed alternation and the 3D cage structures, based on knowledge of the precursors alone.
View Article and Find Full Text PDF