Publications by Connor W Coley | LitMetric

Publications by authors named "Connor W Coley"

Page 1 of 4

Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines.

Brittany C Haas Melissa A Hardy Shree Sowndarya S V Keir Adams Connor W Coley

Digit Discov

November 2024

Data-driven reaction discovery and development is a growing field that relies on the use of molecular descriptors to capture key information about substrates, ligands, and targets. Broad adaptation of this strategy is hindered by the associated computational cost of descriptor calculation, especially when considering conformational flexibility. Descriptor libraries can be precomputed agnostic of application to reduce the computational burden of data-driven reaction development.

View Article and Find Full Text PDF

Comment on "Molecular hypergraph neural networks" [J. Chem. Phys. 160, 144307 (2024)].

Nicholas Casetti Pragnay Nevatia Junwu Chen Philippe Schwaller Connor W Coley

J Chem Phys

November 2024

View Article and Find Full Text PDF

Sequence-Sensitivity in Functional Synthetic Polymer Properties.

Tianyi Jin Connor W Coley Alfredo Alexander-Katz

Angew Chem Int Ed Engl

October 2024

Recently, a new class of synthetic methyl methacrylate-based random heteropolymers (MMA-based RHPs) has displayed protein-like properties. Their function appears to be insensitive to the precise sequence. Here, through atomistic molecular dynamics simulation, we show that there are universal protein-like features of MMA-based RHPs that are insensitive to the sequence, and mostly depend on the overall composition.

View Article and Find Full Text PDF

Extracting structured data from organic synthesis procedures using a fine-tuned large language model.

Qianxiang Ai Fanwang Meng Jiale Shi Brenden Pelkie Connor W Coley

Digit Discov

September 2024

The popularity of data-driven approaches and machine learning (ML) techniques in the field of organic chemistry and its various subfields has increased the value of structured reaction data. Most data in chemistry is represented by unstructured text, and despite the vastness of the organic chemistry literature (papers, patents), manual conversion from unstructured text to structured data remains a largely manual endeavor. Software tools for this task would facilitate downstream applications such as reaction prediction and condition recommendation.

View Article and Find Full Text PDF

Empowering natural product science with AI: leveraging multimodal data and knowledge graphs.

David Meijer Mehdi A Beniddir Connor W Coley Yassine M Mejri Meltem Öztürk

Nat Prod Rep

August 2024

Artificial intelligence (AI) is accelerating how we conduct science, from folding proteins with AlphaFold and summarizing literature findings with large language models, to annotating genomes and prioritizing newly generated molecules for screening using specialized software. However, the application of AI to emulate human cognition in natural product research and its subsequent impact has so far been limited. One reason for this limited impact is that available natural product data is multimodal, unbalanced, unstandardized, and scattered across many data repositories.

View Article and Find Full Text PDF

Reproducing Reaction Mechanisms with Machine-Learning Models Trained on a Large-Scale Mechanistic Dataset.

Joonyoung F Joung Mun Hong Fong Jihye Roh Zhengkai Tu John Bradshaw Connor W Coley

Angew Chem Int Ed Engl

October 2024

Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery. While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset. In this study, we construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps.

View Article and Find Full Text PDF

OpenChemIE: An Information Extraction Toolkit for Chemistry Literature.

Vincent Fan Yujie Qian Alex Wang Amber Wang Connor W Coley

J Chem Inf Model

July 2024

Information extraction from chemistry literature is vital for constructing up-to-date reaction databases for data-driven chemistry. Complete extraction requires combining information across text, tables, and figures, whereas prior work has mainly investigated extracting reactions from single modalities. In this paper, we present OpenChemIE to address this complex challenge and enable the extraction of reaction data at the document level.

View Article and Find Full Text PDF

An algorithmic framework for synthetic cost-aware decision making in molecular design.

Jenna C Fromer Connor W Coley

Nat Comput Sci

June 2024

Small molecules exhibiting desirable property profiles are often discovered through an iterative process of designing, synthesizing and testing sets of molecules. The selection of molecules to synthesize from all possible candidates is a complex decision-making process that typically relies on expert chemist intuition. Here we propose a quantitative decision-making framework, SPARROW, that prioritizes molecules for evaluation by balancing expected information gain and synthetic cost.

View Article and Find Full Text PDF

Data-Efficient, Chemistry-Aware Machine Learning Predictions of Diels-Alder Reaction Outcomes.

Angus Keto Taicheng Guo Morgan Underdue Thijs Stuyver Connor W Coley

J Am Chem Soc

June 2024

The application of machine learning models to the prediction of reaction outcomes currently needs large and/or highly featurized data sets. We show that a chemistry-aware model, NERF, which mimics the bonding changes that occur during reactions, allows for highly accurate predictions of the outcomes of Diels-Alder reactions using a relatively small training set, with no pretraining and no additional features. We establish a diverse data set of 9537 intramolecular, hetero-, aromatic, and inverse electron demand Diels-Alder reactions.

View Article and Find Full Text PDF

Incorporating Synthetic Accessibility in Drug Design: Predicting Reaction Yields of Suzuki Cross-Couplings by Leveraging AbbVie's 15-Year Parallel Library Data Set.

Priyanka Raghavan Alexander J Rago Pritha Verma Majdi M Hassan Gashaw M Goshu Connor W Coley

J Am Chem Soc

June 2024

Despite the increased use of computational tools to supplement medicinal chemists' expertise and intuition in drug design, predicting synthetic yields in medicinal chemistry endeavors remains an unsolved challenge. Existing design workflows could profoundly benefit from reaction yield prediction, as precious material waste could be reduced, and a greater number of relevant compounds could be delivered to advance the design, make, test, analyze (DMTA) cycle. In this work, we detail the evaluation of AbbVie's medicinal chemistry library data set to build machine learning models for the prediction of Suzuki coupling reaction yields.

View Article and Find Full Text PDF

Automation of air-free synthesis.

Babak A Mahjour Connor W Coley

Nat Rev Chem

May 2024

View Article and Find Full Text PDF

RDCanon: A Python Package for Canonicalizing the Order of Tokens in SMARTS Queries.

Babak A Mahjour Connor W Coley

J Chem Inf Model

April 2024

SMARTS is a widely used language in cheminformatics for defining substructural queries for database lookups, reaction templates for chemical transformations, and other applications. As an extension to SMILES, many SMARTS patterns can represent the same query. Despite this, no canonicalization algorithm invariant of the line notation sequence or atomic numbering is publicly available.

View Article and Find Full Text PDF

Generating Molecular Fragmentation Graphs with Autoregressive Neural Networks.

Samuel Goldman Janet Li Connor W Coley

Anal Chem

February 2024

The accurate prediction of tandem mass spectra from molecular structures has the potential to unlock new metabolomic discoveries by augmenting the community's libraries of experimental reference standards. Cheminformatic spectrum prediction strategies use a "bond-breaking" framework to iteratively simulate mass spectrum fragmentations, but these methods are (a) slow due to the need to exhaustively and combinatorially break molecules and (b) inaccurate as they often rely upon heuristics to predict the intensity of each resulting fragment; neural network alternatives mitigate computational cost but are black-box and not inherently more accurate. We introduce a physically grounded neural approach that learns to predict each breakage event and score the most relevant subset of molecular fragments quickly and accurately.

View Article and Find Full Text PDF

The promise and pitfalls of AI for molecular and materials synthesis.

Nicholas David Wenhao Sun Connor W Coley

Nat Comput Sci

May 2023

View Article and Find Full Text PDF

Dataset Design for Building Models of Chemical Reactivity.

Priyanka Raghavan Brittany C Haas Madeline E Ruos Jules Schleinitz Abigail G Doyle Connor W Coley

ACS Cent Sci

December 2023

Models can codify our understanding of chemical reactivity and serve a useful purpose in the development of new synthetic processes via, for example, evaluating hypothetical reaction conditions or in silico substrate tolerance. Perhaps the most determining factor is the composition of the training data and whether it is sufficient to train a model that can make accurate predictions over the full domain of interest. Here, we discuss the design of reaction datasets in ways that are conducive to data-driven modeling, emphasizing the idea that training set diversity and model generalizability rely on the choice of molecular or reaction representation.

View Article and Find Full Text PDF

Author Correction: Diversity-oriented synthesis encoded by deoxyoligonucleotides.

Liam Hudson Jeremy W Mason Matthias V Westphal Matthieu J R Richter Jonathan R Thielman Connor W Coley

Nat Commun

November 2023

View Article and Find Full Text PDF

DNA-encoded library-enabled discovery of proximity-inducing small molecules.

Jeremy W Mason Yuen Ting Chow Liam Hudson Antonin Tutter Gregory Michaud Connor W Coley

Nat Chem Biol

February 2024

Article Synopsis

Scientists are trying to create small molecules that help certain proteins stick together, which can change how cells work.
They made about 1 million special compounds using DNA to find out which ones can connect two chosen proteins, specifically VHL and bromodomains.
By testing these compounds, they discovered some that could make the bromodomains disappear in cells and even got to see how one of the best compounds interacted with the proteins in a crystal structure.

View Article and Find Full Text PDF

A Computationally Informed Unified View on the Effect of Polarity and Sterics on the Glass Transition in Vinyl-based Polymer Melts.

Tianyi Jin Connor W Coley Alfredo Alexander-Katz

ACS Macro Lett

November 2023

We unveil a unified view on the effect of side chains on the glass transition temperatures () in polymer melts by using molecular dynamics simulations, density functional theory calculations, and available experimental data. We use acrylates as a model system and evaluate the effect of -alkyl side chains on . We find that backbone dihedral angle fluctuations follow established patterns due to sterics, as expected.

View Article and Find Full Text PDF

MIST-CF: Chemical Formula Inference from Tandem Mass Spectra.

Samuel Goldman Jiayi Xin Joules Provenzano Connor W Coley

J Chem Inf Model

April 2024

Chemical formula annotation for tandem mass spectrometry (MS/MS) data is the first step toward structurally elucidating unknown metabolites. While great strides have been made toward solving this problem, the current state-of-the-art method depends on time-intensive, proprietary, and expert-parametrized fragmentation tree construction and scoring. In this work, we extend our previous spectrum Transformer methodology into an energy-based modeling framework, MIST-CF: Metabolite Inference with Spectrum Transformers for Chemical Formula prediction, for learning to rank chemical formula and adduct assignments given an unannotated MS/MS spectrum.

View Article and Find Full Text PDF

Publisher Correction: Scientific discovery in the age of artificial intelligence.

Hanchen Wang Tianfan Fu Yuanqi Du Wenhao Gao Kexin Huang Connor W Coley

Nature

September 2023

View Article and Find Full Text PDF

Diversity-oriented synthesis encoded by deoxyoligonucleotides.

Liam Hudson Jeremy W Mason Matthias V Westphal Matthieu J R Richter Jonathan R Thielman Connor W Coley

Nat Commun

August 2023

Diversity-oriented synthesis (DOS) is a powerful strategy to prepare molecules with underrepresented features in commercial screening collections, resulting in the elucidation of novel biological mechanisms. In parallel to the development of DOS, DNA-encoded libraries (DELs) have emerged as an effective, efficient screening strategy to identify protein binders. Despite recent advancements in this field, most DEL syntheses are limited by the presence of sensitive DNA-based constructs.

View Article and Find Full Text PDF

Scientific discovery in the age of artificial intelligence.

Hanchen Wang Tianfan Fu Yuanqi Du Wenhao Gao Kexin Huang Connor W Coley

Nature

August 2023

Article Synopsis

Artificial intelligence is revolutionizing scientific discovery by enhancing research processes such as hypothesis generation, experiment design, and data interpretation.
Recent advances like self-supervised learning and geometric deep learning are improving model accuracy by utilizing vast amounts of unlabelled data and incorporating the structure of scientific data.
While generative AI is helping create innovations like drugs and proteins, challenges like data quality and the need for better understanding among AI developers and users persist, highlighting areas for further progress in AI research.

View Article and Find Full Text PDF

Combining Molecular Quantum Mechanical Modeling and Machine Learning for Accelerated Reaction Screening and Discovery.

Nicholas Casetti Javier E Alfonso-Ramos Connor W Coley Thijs Stuyver

Chemistry

October 2023

Molecular quantum mechanical modeling, accelerated by machine learning, has opened the door to high-throughput screening campaigns of complex properties, such as the activation energies of chemical reactions and absorption/emission spectra of materials and molecules; in silico. Here, we present an overview of the main principles, concepts, and design considerations involved in such hybrid computational quantum chemistry/machine learning screening workflows, with a special emphasis on some recent examples of their successful application. We end with a brief outlook of further advances that will benefit the field.

View Article and Find Full Text PDF

Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data.

Rocío Mercado Steven M Kearnes Connor W Coley

J Chem Inf Model

July 2023

The past decade has seen a number of impressive developments in predictive chemistry and reaction informatics driven by machine learning applications to computer-aided synthesis planning. While many of these developments have been made even with relatively small, bespoke data sets, in order to advance the role of AI in the field at scale, there must be significant improvements in the reporting of reaction data. Currently, the majority of publicly available data is reported in an unstructured format and heavily imbalanced toward high-yielding reactions, which influences the types of models that can be successfully trained.

View Article and Find Full Text PDF

RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing.

Yujie Qian Jiang Guo Zhengkai Tu Connor W Coley Regina Barzilay

J Chem Inf Model

July 2023

Article Synopsis

The paper discusses RxnScribe, a machine learning model designed to extract structured data from complex reaction diagrams found in chemistry literature.
It uses a sequence generation approach to streamline the traditional parsing process into a more efficient end-to-end model, achieving an 80.0% soft match F1 score through cross-validation.
The authors have made their code and dataset available for public access on GitHub, promoting further research in this area.

View Article and Find Full Text PDF