Publications by authors named "Coley C"

Automated chemistry platforms hold the potential to enable large-scale organic synthesis campaigns, such as producing a library of compounds for biological evaluation. The efficiency of such platforms will depend on the schedule according to which the synthesis operations are executed. In this work, we study the scheduling problem for chemical library synthesis, where operations from interdependent synthetic routes are scheduled to minimize the makespan-the total duration of the synthesis campaign.

View Article and Find Full Text PDF

Data-driven reaction discovery and development is a growing field that relies on the use of molecular descriptors to capture key information about substrates, ligands, and targets. Broad adaptation of this strategy is hindered by the associated computational cost of descriptor calculation, especially when considering conformational flexibility. Descriptor libraries can be precomputed agnostic of application to reduce the computational burden of data-driven reaction development.

View Article and Find Full Text PDF

Recently, a new class of synthetic methyl methacrylate-based random heteropolymers (MMA-based RHPs) has displayed protein-like properties. Their function appears to be insensitive to the precise sequence. Here, through atomistic molecular dynamics simulation, we show that there are universal protein-like features of MMA-based RHPs that are insensitive to the sequence, and mostly depend on the overall composition.

View Article and Find Full Text PDF

The popularity of data-driven approaches and machine learning (ML) techniques in the field of organic chemistry and its various subfields has increased the value of structured reaction data. Most data in chemistry is represented by unstructured text, and despite the vastness of the organic chemistry literature (papers, patents), manual conversion from unstructured text to structured data remains a largely manual endeavor. Software tools for this task would facilitate downstream applications such as reaction prediction and condition recommendation.

View Article and Find Full Text PDF

Artificial intelligence (AI) is accelerating how we conduct science, from folding proteins with AlphaFold and summarizing literature findings with large language models, to annotating genomes and prioritizing newly generated molecules for screening using specialized software. However, the application of AI to emulate human cognition in natural product research and its subsequent impact has so far been limited. One reason for this limited impact is that available natural product data is multimodal, unbalanced, unstandardized, and scattered across many data repositories.

View Article and Find Full Text PDF

Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery. While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset. In this study, we construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps.

View Article and Find Full Text PDF

Information extraction from chemistry literature is vital for constructing up-to-date reaction databases for data-driven chemistry. Complete extraction requires combining information across text, tables, and figures, whereas prior work has mainly investigated extracting reactions from single modalities. In this paper, we present OpenChemIE to address this complex challenge and enable the extraction of reaction data at the document level.

View Article and Find Full Text PDF

Small molecules exhibiting desirable property profiles are often discovered through an iterative process of designing, synthesizing and testing sets of molecules. The selection of molecules to synthesize from all possible candidates is a complex decision-making process that typically relies on expert chemist intuition. Here we propose a quantitative decision-making framework, SPARROW, that prioritizes molecules for evaluation by balancing expected information gain and synthetic cost.

View Article and Find Full Text PDF

The application of machine learning models to the prediction of reaction outcomes currently needs large and/or highly featurized data sets. We show that a chemistry-aware model, NERF, which mimics the bonding changes that occur during reactions, allows for highly accurate predictions of the outcomes of Diels-Alder reactions using a relatively small training set, with no pretraining and no additional features. We establish a diverse data set of 9537 intramolecular, hetero-, aromatic, and inverse electron demand Diels-Alder reactions.

View Article and Find Full Text PDF

Despite the increased use of computational tools to supplement medicinal chemists' expertise and intuition in drug design, predicting synthetic yields in medicinal chemistry endeavors remains an unsolved challenge. Existing design workflows could profoundly benefit from reaction yield prediction, as precious material waste could be reduced, and a greater number of relevant compounds could be delivered to advance the design, make, test, analyze (DMTA) cycle. In this work, we detail the evaluation of AbbVie's medicinal chemistry library data set to build machine learning models for the prediction of Suzuki coupling reaction yields.

View Article and Find Full Text PDF

Prepubertal obesity is growing at an alarming rate and is now considered a risk factor for renal injury. Recently, we reported that the early development of renal injury in obese Dahl salt-sensitive (SS) leptin receptor mutant (SS LepR mutant) rats was associated with increased T-cell infiltration and activation before puberty. Therefore, the current study investigated the effect of inhibiting T-cell activation with abatacept on the progression of renal injury in young obese SS LepR mutant rats before puberty.

View Article and Find Full Text PDF

SMARTS is a widely used language in cheminformatics for defining substructural queries for database lookups, reaction templates for chemical transformations, and other applications. As an extension to SMILES, many SMARTS patterns can represent the same query. Despite this, no canonicalization algorithm invariant of the line notation sequence or atomic numbering is publicly available.

View Article and Find Full Text PDF

Aims: Assess the potential benefits of using PedBotLab, a clinic based robotic ankle platform with integrated video game software, to improve ankle active and passive range of motion, strength, selective motor control, gait efficiency, and balance.

Methods: Ten participants with static neurological injuries and independent ambulation participated in a 10-week pilot study (Pro00013680) to assess feasibility and efficacy of PedBotLab as a therapeutic device twice weekly. Isometric ankle strength, passive and active ankle range of motion, plantarflexor spasticity, selective motor control of the lower extremity, balance, and gait speed were measured pre- and post-trial.

View Article and Find Full Text PDF

The accurate prediction of tandem mass spectra from molecular structures has the potential to unlock new metabolomic discoveries by augmenting the community's libraries of experimental reference standards. Cheminformatic spectrum prediction strategies use a "bond-breaking" framework to iteratively simulate mass spectrum fragmentations, but these methods are (a) slow due to the need to exhaustively and combinatorially break molecules and (b) inaccurate as they often rely upon heuristics to predict the intensity of each resulting fragment; neural network alternatives mitigate computational cost but are black-box and not inherently more accurate. We introduce a physically grounded neural approach that learns to predict each breakage event and score the most relevant subset of molecular fragments quickly and accurately.

View Article and Find Full Text PDF

Models can codify our understanding of chemical reactivity and serve a useful purpose in the development of new synthetic processes via, for example, evaluating hypothetical reaction conditions or in silico substrate tolerance. Perhaps the most determining factor is the composition of the training data and whether it is sufficient to train a model that can make accurate predictions over the full domain of interest. Here, we discuss the design of reaction datasets in ways that are conducive to data-driven modeling, emphasizing the idea that training set diversity and model generalizability rely on the choice of molecular or reaction representation.

View Article and Find Full Text PDF
Article Synopsis
  • Scientists are trying to create small molecules that help certain proteins stick together, which can change how cells work.
  • They made about 1 million special compounds using DNA to find out which ones can connect two chosen proteins, specifically VHL and bromodomains.
  • By testing these compounds, they discovered some that could make the bromodomains disappear in cells and even got to see how one of the best compounds interacted with the proteins in a crystal structure.
View Article and Find Full Text PDF

We unveil a unified view on the effect of side chains on the glass transition temperatures () in polymer melts by using molecular dynamics simulations, density functional theory calculations, and available experimental data. We use acrylates as a model system and evaluate the effect of -alkyl side chains on . We find that backbone dihedral angle fluctuations follow established patterns due to sterics, as expected.

View Article and Find Full Text PDF

Chemical formula annotation for tandem mass spectrometry (MS/MS) data is the first step toward structurally elucidating unknown metabolites. While great strides have been made toward solving this problem, the current state-of-the-art method depends on time-intensive, proprietary, and expert-parametrized fragmentation tree construction and scoring. In this work, we extend our previous spectrum Transformer methodology into an energy-based modeling framework, MIST-CF: Metabolite Inference with Spectrum Transformers for Chemical Formula prediction, for learning to rank chemical formula and adduct assignments given an unannotated MS/MS spectrum.

View Article and Find Full Text PDF

Diversity-oriented synthesis (DOS) is a powerful strategy to prepare molecules with underrepresented features in commercial screening collections, resulting in the elucidation of novel biological mechanisms. In parallel to the development of DOS, DNA-encoded libraries (DELs) have emerged as an effective, efficient screening strategy to identify protein binders. Despite recent advancements in this field, most DEL syntheses are limited by the presence of sensitive DNA-based constructs.

View Article and Find Full Text PDF
Article Synopsis
  • Artificial intelligence is revolutionizing scientific discovery by enhancing research processes such as hypothesis generation, experiment design, and data interpretation.
  • Recent advances like self-supervised learning and geometric deep learning are improving model accuracy by utilizing vast amounts of unlabelled data and incorporating the structure of scientific data.
  • While generative AI is helping create innovations like drugs and proteins, challenges like data quality and the need for better understanding among AI developers and users persist, highlighting areas for further progress in AI research.
View Article and Find Full Text PDF