Publications by Jurgen Bajorath | LitMetric

Publications by authors named "Jurgen Bajorath"

Page 1 of 24

Protocol to generate dual-target compounds using a transformer chemical language model.

Sanjana Srinivasan Jürgen Bajorath

STAR Protoc

January 2025

Here, we present a protocol to generate dual-target compounds (DT-CPDs) interacting with two distinct target proteins using a transformer-based chemical language model. We describe steps for installing software, preparing data, and pre-training the model on pairs of single-target compounds (ST-CPDs), which bind to an individual protein, and DT-CPDs. We then detail procedures for assembling ST- and corresponding DT-CPD data for specific protein pairs and evaluating the model's performance on hold-out test sets.

View Article and Find Full Text PDF

Rationalizing Predictions of Isoform-Selective Phosphoinositide 3-Kinase Inhibitors Using MolAnchor Analysis.

Alec Lamens Jürgen Bajorath

J Chem Inf Model

January 2025

Explaining the predictions of machine learning models is of critical importance for integrating predictive modeling in drug discovery projects. We have generated a test system for predicting isoform selectivity of phosphoinositide 3-kinase (PI3K) inhibitors and systematically analyzed correct predictions of selective inhibitors using a new methodology termed MolAnchor, which is based on the "anchors" concept from explainable artificial intelligence. The approach is designed to generate chemically intuitive explanations of compound predictions.

View Article and Find Full Text PDF

Context-dependent similarity analysis of analogue series for structure-activity relationship transfer based on a concept from natural language processing.

Atsushi Yoshimori Jürgen Bajorath

J Cheminform

January 2025

Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure-activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing.

View Article and Find Full Text PDF

Influence of Data Curation and Confidence Levels on Compound Predictions Using Machine Learning Models.

Elena Xerxa Martin Vogt Jürgen Bajorath

J Chem Inf Model

December 2024

While data curation principles and practices are a major topic in data science, they are often not explicitly considered in machine learning (ML) applications in chemistry. We have been interested in evaluating the potential effects of data curation on the performance of molecular ML models. Therefore, a sequential curation scheme was developed for compounds and activity data, and different ML classification models were generated at increasing data confidence levels and evaluated.

View Article and Find Full Text PDF

Combining a Chemical Language Model and the Structure-Activity Relationship Matrix Formalism for Generative Design of Potent Compounds with Core Structure and Substituent Modifications.

Hengwei Chen Jürgen Bajorath

J Chem Inf Model

December 2024

Article Synopsis

Compound optimization in medicinal chemistry involves creating series of analogues to study structure-activity relationships (SARs), with a focus on improving potency.* -
A new computational method integrates a transformer chemical language model (CLM) with a SAR matrix (SARM) to generate potent analogues with modifications at various sites.* -
This methodology demonstrated its effectiveness by accurately predicting known potent compounds and producing diverse series through structural and substituent adjustments.*

View Article and Find Full Text PDF

Protocol to calculate and compare exact Shapley values for different kernels in support vector machine models using binary features.

Jannik P Roth Jürgen Bajorath

STAR Protoc

December 2024

The Shapley value formalism from cooperative game theory was adapted to explain predictions of machine learning models. Here, we present a protocol to calculate and compare exact Shapley values for support vector machine models with commonly used kernels and binary input features. We describe steps for installing software, preparing data, and calculating Shapley values with customizable Python scripts.

View Article and Find Full Text PDF

Milestones in chemoinformatics: global view of the field.

Jürgen Bajorath

J Cheminform

November 2024

Over the past ~ 25 years, chemoinformatics has evolved as a scientific discipline, with a strong foundation in pharmaceutical research and scientific roots that can be traced back to the late 1950s. It covers a wide methodological spectrum and is perhaps best positioned in the greater context of chemical information science. Herein, the chemoinformatics discipline is delineated, characteristic (and partly problematic) features are discussed, and a global view of the field is provided, emphasizing key developments.

View Article and Find Full Text PDF

Assessing Darkness of the Human Kinome from a Medicinal Chemistry Perspective.

Selina Voßen Elena Xerxa Jürgen Bajorath

J Med Chem

October 2024

In drug discovery, human protein kinases (PKs) represent one of the major target classes due to their central role in cellular signaling, implication in various diseases as a consequence of deregulated signaling, and notable druggability. Individual PKs and their disease biology have been explored to different degrees, giving rise to heterogeneous functional knowledge and disease associations across the human kinome. The U.

View Article and Find Full Text PDF

Kinase Drug Discovery: Impact of Open Science and Artificial Intelligence.

Filip Miljković Jürgen Bajorath

Mol Pharm

October 2024

Given their central role in signal transduction, protein kinases (PKs) were first implicated in cancer development, caused by aberrant intracellular signaling events. Since then, PKs have become major targets in different therapeutic areas. The preferred approach to therapeutic intervention of PK-dependent diseases is the use of small molecules to inhibit their catalytic phosphate group transfer activity.

View Article and Find Full Text PDF

Extension of multi-site analogue series with potent compounds using a bidirectional transformer-based chemical language model.

Hengwei Chen Atsushi Yoshimori Jürgen Bajorath

RSC Med Chem

July 2024

Generating potent compounds for evolving analogue series (AS) is a key challenge in medicinal chemistry. The versatility of chemical language models (CLMs) makes it possible to formulate this challenge as an off-the-beaten-path prediction task. In this work, we have devised a coding and tokenization scheme for evolving AS with multiple substitution sites (multi-site AS) and implemented a bidirectional transformer to predict new potent analogues for such series.

View Article and Find Full Text PDF

Chemical and biological language models in molecular design: opportunities, risks and scientific reasoning.

Jürgen Bajorath

Future Sci OA

May 2024

View Article and Find Full Text PDF

MAATrica: a measure for assessing consistency and methods in medicinal and nutraceutical chemistry papers.

Giulia Panzarella Alessandro Gallo Sandra Coecke Maddalena Querci Francesco Ortuso Jürgen Bajorath

Eur J Med Chem

July 2024

The growing number of scientific papers and document sources underscores the need for methods capable of evaluating the quality of publications. Researchers who are looking for relevant papers for their studies need ways to assess the scientific value of these documents. One approach involves using semantic search engines that can automatically extract important knowledge from the growing body of text.

View Article and Find Full Text PDF

Systematic generation and analysis of counterfactuals for compound activity predictions using multi-task models.

Alec Lamens Jürgen Bajorath

RSC Med Chem

May 2024

Most machine learning (ML) methods produce predictions that are hard or impossible to understand. The black box nature of predictive models obscures potential learning bias and makes it difficult to recognize and trace problems. Moreover, the inability to rationalize model decisions causes reluctance to accept predictions for experimental design.

View Article and Find Full Text PDF

Generative design of compounds with desired potency from target protein sequences using a multimodal biochemical language model.

Hengwei Chen Jürgen Bajorath

J Cheminform

May 2024

Article Synopsis

- This research explores using deep learning models, originally from natural language processing, to predict active compounds by translating sequential molecular data, focusing on chemical language models for compound transformations.
- A unique dual-component language model was created that combines a protein language model to generate sequence embeddings and a conditional transformer to predict new active compounds based on desired potency values.
- The model showed success by reproducing known compounds with various potencies and generated a diverse array of candidate compounds, suggesting its potential for practical applications in compound design and development.

View Article and Find Full Text PDF

Data-oriented protein kinase drug discovery.

Elena Xerxa Jürgen Bajorath

Eur J Med Chem

May 2024

The continued growth of data from biological screening and medicinal chemistry provides opportunities for data-driven experimental design and decision making in early-phase drug discovery. Approaches adopted from data science help to integrate internal and public domain data and extract knowledge from historical in-house data. Protein kinase (PK) drug discovery is an exemplary area where large amounts of data are accumulating, providing a valuable knowledge base for discovery projects.

View Article and Find Full Text PDF

Protocol to explain support vector machine predictions via exact Shapley value computation.

Andrea Mastropietro Jürgen Bajorath

STAR Protoc

June 2024

Shapley values from cooperative game theory are adapted for explaining machine learning predictions. For large feature sets used in machine learning, Shapley values are approximated. We present a protocol for two techniques for explaining support vector machine predictions with exact Shapley value computation.

View Article and Find Full Text PDF

Comprehensive Data-Driven Assessment of Non-Kinase Targets of Inhibitors of the Human Kinome.

Mona Mobasher Martin Vogt Elena Xerxa Jürgen Bajorath

Biomolecules

February 2024

Protein kinases (PKs) are involved in many intracellular signal transduction pathways through phosphorylation cascades and have become intensely investigated pharmaceutical targets over the past two decades. Inhibition of PKs using small-molecular inhibitors is a premier strategy for the treatment of diseases in different therapeutic areas that are caused by uncontrolled PK-mediated phosphorylation and aberrant signaling. Most PK inhibitors (PKIs) are directed against the ATP cofactor binding site that is largely conserved across the human kinome comprising 518 wild-type PKs (and many mutant forms).

View Article and Find Full Text PDF

Relationship between prediction accuracy and uncertainty in compound potency prediction using deep neural networks and control models.

Jannik P Roth Jürgen Bajorath

Sci Rep

March 2024

The assessment of prediction variance or uncertainty contributes to the evaluation of machine learning models. In molecular machine learning, uncertainty quantification is an evolving area of research where currently no standard approaches or general guidelines are available. We have carried out a detailed analysis of deep neural network variants and simple control models for compound potency prediction to study relationships between prediction accuracy and uncertainty.

View Article and Find Full Text PDF

Chemical language models for molecular design.

Jürgen Bajorath

Mol Inform

January 2024

In drug discovery, chemical language models (CLMs) originating from natural language processing offer new opportunities for molecular design. CLMs have been developed using recurrent neural network (RNN) or transformer architectures. For the predictive performance of RNN-based encoder-decoder frameworks and transformers, attention mechanisms play a central role.

View Article and Find Full Text PDF

Generation of Molecular Counterfactuals for Explainable Machine Learning Based on Core-Substituent Recombination.

Alec Lamens Jürgen Bajorath

ChemMedChem

February 2024

The use of black box machine learning models whose decisions cannot be understood limits the acceptance of predictions in interdisciplinary research and camouflages artificial learning characteristics leading to predictions for other than anticipated reasons. Consequently, there is increasing interest in explainable artificial intelligence to rationalize predictions and uncover potential pitfalls. Among others, relevant approaches include feature attribution methods to identify molecular structures determining predictions and counterfactuals (CFs) or contrastive explanations.

View Article and Find Full Text PDF

Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel.

Andrea Mastropietro Christian Feldmann Jürgen Bajorath

Sci Rep

November 2023

Machine learning (ML) algorithms are extensively used in pharmaceutical research. Most ML models have black-box character, thus preventing the interpretation of predictions. However, rationalizing model decisions is of critical importance if predictions should aid in experimental design.

View Article and Find Full Text PDF

Anatomy of Potency Predictions Focusing on Structural Analogues with Increasing Potency Differences Including Activity Cliffs.

Tiago Janela Jürgen Bajorath

J Chem Inf Model

November 2023

Potency predictions are popular in compound design and optimization but are complicated by intrinsic limitations. Moreover, even for nonlinear methods, activity cliffs (ACs, formed by structural analogues with large potency differences) represent challenging test cases for compound potency predictions. We have devised a new test system for potency predictions, including AC compounds, that is based on partitioned matched molecular pairs (MMP) and makes it possible to monitor prediction accuracy at the level of analogue pairs with increasing potency differences.

View Article and Find Full Text PDF

Rationalizing general limitations in assessing and comparing methods for compound potency prediction.

Tiago Janela Jürgen Bajorath

Sci Rep

October 2023

Compound potency predictions play a major role in computational drug discovery. Predictive methods are typically evaluated and compared in benchmark calculations that are widely applied. Previous studies have revealed intrinsic limitations of potency prediction benchmarks including very similar performance of increasingly complex machine learning methods and simple controls and narrow error margins separating machine learning from randomized predictions.

View Article and Find Full Text PDF

Data sets of human and mouse protein kinase inhibitors with curated activity data including covalent inhibitors.

Elena Xerxa Jürgen Bajorath

Future Sci OA

October 2023

Aim: Generation of high-quality data sets of protein kinase inhibitors (PKIs).

Methodology: Publicly available PKIs with reliable activity data were curated. PKIs with very weak activity were classified as inactive.

View Article and Find Full Text PDF

Meta-learning for transformer-based prediction of potent compounds.

Hengwei Chen Jürgen Bajorath

Sci Rep

September 2023

For many machine learning applications in drug discovery, only limited amounts of training data are available. This typically applies to compound design and activity prediction and often restricts machine learning, especially deep learning. For low-data applications, specialized learning strategies can be considered to limit required training data.

View Article and Find Full Text PDF