Global reactivity models are impactful in industrial synthesis applications.

J Cheminform

In-Silico Discovery and External Innovation (ISDEI), Janssen Research & Development, Janssen Pharmaceutica N.V, Beerse, Belgium.

Published: February 2023

Artificial Intelligence is revolutionizing many aspects of the pharmaceutical industry. Deep learning models are now routinely applied to guide drug discovery projects leading to faster and improved findings, but there are still many tasks with enormous unrealized potential. One such task is the reaction yield prediction. Every year more than one fifth of all synthesis attempts result in product yields which are either zero or too low. This equates to chemical and human resources being spent on activities which ultimately do not progress the programs, leading to a triple loss when accounting for the cost of opportunity in time wasted. In this work we pre-train a BERT model on more than 16 million reactions from 4 different data sources, and fine tune it to achieve an uncertainty calibrated global yield prediction model. This model is an improvement upon state of the art not just from the increase in pre-train data but also by introducing a new embedding layer which solves a few limitations of SMILES and enables integration of additional information such as equivalents and molecule role into the reaction encoding, the model is called BERT Enriched Embedding (BEE). The model is benchmarked on an open-source dataset against a state-of-the-art synthesis focused BERT showing a near 20-point improvement in r2 score. The model is fine-tuned and tested on an internal company data benchmark, and a prospective study shows that the application of the model can reduce the total number of negative reactions (yield under 5%) ran in Janssen by at least 34%. Lastly, we corroborate the previous results through experimental validation, by directly deploying the model in an on-going drug discovery project and showing that it can also be used successfully as a reagent recommender due to its fast inference speed and reliable confidence estimation, a critical feature for industry application.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9921076PMC
http://dx.doi.org/10.1186/s13321-023-00685-0DOI Listing

Publication Analysis

Top Keywords

drug discovery
8
yield prediction
8
model
8
global reactivity
4
reactivity models
4
models impactful
4
impactful industrial
4
industrial synthesis
4
synthesis applications
4
applications artificial
4

Similar Publications

Application of biomass carbon dots in food packaging.

Environ Sci Pollut Res Int

January 2025

College of Materials Science and Engineering, Nanjing Forestry University, Nanjing, 210037, China.

Since its discovery, carbon quantum dots (CDs) have been widely applied in cell imaging, drug delivery, biosensing, and photocatalysis due to their excellent water solubility, chemical stability, fluorescence stability biocompatibility, low toxicity, and preparation cost. However, the low fluorescence yield and poor surface structure limit the application of CDs. Heteroatom doping is considered an ideal method to improve CDs' optical and electrical properties.

View Article and Find Full Text PDF

Dementia refers to an umbrella phenotype of many different underlying pathologies with Alzheimer's disease (AD) being the most common type. Neuropathological examination remains the gold standard for accurate AD diagnosis, however, most that we know about AD genetics is based on Genome-Wide Association Studies (GWAS) of clinically defined AD. Such studies have identified multiple AD susceptibility variants with a significant portion of the heritability unexplained and highlighting the phenotypic and genetic heterogeneity of the clinically defined entity.

View Article and Find Full Text PDF

Background: Clear cell renal cell carcinoma (ccRCC) is the most common histologic type of RCC. However, the spatial and functional heterogeneity of immunosuppressive cells and the mechanisms by which their interactions promote immunosuppression in the ccRCC have not been thoroughly investigated.

Methods: To further investigate the cellular and regional heterogeneity of ccRCC, we analyzed single-cell and spatial transcriptome RNA sequencing data from four patients, which were obtained from samples from multiple regions, including the tumor core, tumor-normal interface, and distal normal tissue.

View Article and Find Full Text PDF

Molecular basis of JAK kinase regulation guiding therapeutic approaches: Evaluating the JAK3 pseudokinase domain as a drug target.

Adv Biol Regul

December 2024

Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpönkatu 34, 33014, Finland; Institute of Biotechnology, HiLIFE, University of Helsinki, P.O. Box 56, 00014, Finland; Department of Microbiology, Fimlab Laboratories, P.O.Box 66, 33013, Tampere, Finland. Electronic address:

Janus kinases (JAK1-3, TYK2) are critical mediators of cytokine signaling and their role in hematological and inflammatory and autoimmune diseases has sparked widespread interest in their therapeutic targeting. JAKs have unique tandem kinase structure consisting of an active tyrosine kinase domain adjacent to a pseudokinase domain that is a hotspot for pathogenic mutations. The development of JAK inhibitors has focused on the active kinase domain and the developed drugs have demonstrated good clinical efficacy but due to off-target inhibition cause also side-effects and carry a black box warning limiting their use.

View Article and Find Full Text PDF

FOXM1 is the "Achilles' heel" of cancers and hence the potential therapeutic target for anticancer drug discovery. In this work, we selected high affinity peptides against the protein of human DNA binding domain of FOXM1 (FOXM1-DBD) from the disulfide-constrained, phage displayed random cyclic heptapeptide library Ph.D.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!