Publications by Baskin I | LitMetric

Publications by authors named "Baskin I"

Page 1 of 4

Chemoinformatics for corrosion science: Data-driven modeling of corrosion inhibition by organic molecules.

Igor Baskin Yair Ein-Eli

Mol Inform

November 2024

This paper reviews the application of machine learning to the inhibition of corrosion by organic molecules. The methodologies considered include quantitative structure-property relationships (QSPR) and related data-driven approaches. The characteristic features of their key components are considered as applied to corrosion inhibition, including datasets, response properties, molecular descriptors, machine learning methods, and structure-property models.

View Article and Find Full Text PDF

Conjugated quantitative structure-property relationship models: Prediction of kinetic characteristics linked by the Arrhenius equation.

Dmitry Zankov Timur Madzhidov Igor Baskin Alexandre Varnek

Mol Inform

October 2023

Conjugated QSPR models for reactions integrate fundamental chemical laws expressed by mathematical equations with machine learning algorithms. Herein we present a methodology for building conjugated QSPR models integrated with the Arrhenius equation. Conjugated QSPR models were used to predict kinetic characteristics of cycloaddition reactions related by the Arrhenius equation: rate constant , pre-exponential factor , and activation energy .

View Article and Find Full Text PDF

Inverse QSAR: Reversing Descriptor-Driven Prediction Pipeline Using Attention-Based Conditional Variational Autoencoder.

William Bort Daniyar Mazitov Dragos Horvath Fanny Bonachera Arkadii Lin

J Chem Inf Model

November 2022

In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL).

View Article and Find Full Text PDF

QSAR Modeling Based on Conformation Ensembles Using a Multi-Instance Learning Approach.

Dmitry V Zankov Mariia Matveieva Aleksandra V Nikonenko Ramil I Nugmanov Igor I Baskin

J Chem Inf Model

October 2021

Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance.

View Article and Find Full Text PDF

Multiple Conformer Descriptors for QSAR Modeling.

Aleksandra Nikonenko Dmitry Zankov Igor Baskin Timur Madzhidov Pavel Polishchuk

Mol Inform

November 2021

The most widely used QSAR approaches are mainly based on 2D molecular representation which ignores stereoconfiguration and conformational flexibility of compounds. 3D QSAR uses a single conformer of each compound which is difficult to choose reasonably. 4D QSAR uses multiple conformers to overcome the issues of 2D and 3D methods.

View Article and Find Full Text PDF

Global COVID-19 lockdown highlights humans as both threats and custodians of the environment.

Amanda E Bates Richard B Primack Brandy S Biggar Tomas J Bird Mary E Clinton

Biol Conserv

November 2021

The global lockdown to mitigate COVID-19 pandemic health risks has altered human interactions with nature. Here, we report immediate impacts of changes in human activities on wildlife and environmental threats during the early lockdown months of 2020, based on 877 qualitative reports and 332 quantitative assessments from 89 different studies. Hundreds of reports of unusual species observations from around the world suggest that animals quickly responded to the reductions in human presence.

View Article and Find Full Text PDF

Practical constraints with machine learning in drug discovery.

Expert Opin Drug Discov

September 2021

View Article and Find Full Text PDF

Cross-validation strategies in QSPR modelling of chemical reactions.

A Rakhimbekova T N Akhmetshin G I Minibaeva R I Nugmanov T R Gimadiev

SAR QSAR Environ Res

March 2021

In this article, we consider cross-validation of the quantitative structure-property relationship models for reactions and show that the conventional k-fold cross-validation (CV) procedure gives an 'optimistically' biased assessment of prediction performance. To address this issue, we suggest two strategies of model cross-validation, 'transformation-out' CV, and 'solvent-out' CV. Unlike the conventional k-fold cross-validation approach that does not consider the nature of objects, the proposed procedures provide an unbiased estimation of the predictive performance of the models for novel types of structural transformations in chemical reactions and reactions going under new conditions.

View Article and Find Full Text PDF

Discovery of novel chemical reactions by deep generative recurrent neural network.

William Bort Igor I Baskin Timur Gimadiev Artem Mukanov Ramil Nugmanov

Sci Rep

February 2021

The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class.

View Article and Find Full Text PDF

Comprehensive Analysis of Applicability Domains of QSPR Models for Chemical Reactions.

Assima Rakhimbekova Timur I Madzhidov Ramil I Nugmanov Timur R Gimadiev Igor I Baskin

Int J Mol Sci

August 2020

Nowadays, the problem of the model's applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions.

View Article and Find Full Text PDF

Autoignition temperature: comprehensive data analysis and predictive models.

I I Baskin S Lozano M Durot G Marcou D Horvath

SAR QSAR Environ Res

August 2020

Here we report a new predictive model for autoignition temperature (AIT), an important physical parameter widely used to assess potential safety hazards of combustible materials. Available structure-AIT data extracted from different sources were critically analysed. Support vector regression (SVR) models on different data subsets were built in order to identify a reliable compound set on which a realistic model could be built.

View Article and Find Full Text PDF

Correction: QSAR without borders.

Eugene N Muratov Jürgen Bajorath Robert P Sheridan Igor V Tetko Dmitry Filimonov

Chem Soc Rev

June 2020

Correction for 'QSAR without borders' by Eugene N. Muratov et al., Chem.

View Article and Find Full Text PDF

QSAR without borders.

Eugene N Muratov Jürgen Bajorath Robert P Sheridan Igor V Tetko Dmitry Filimonov

Chem Soc Rev

June 2020

Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics.

View Article and Find Full Text PDF

Parallel Generative Topographic Mapping: An Efficient Approach for Big Data Handling.

Arkadii Lin Igor I Baskin Gilles Marcou Dragos Horvath Bernd Beck

Mol Inform

December 2020

Generative Topographic Mapping (GTM) can be efficiently used to visualize, analyze and model large chemical data. The GTM manifold needs to span the chemical space deemed relevant for a given problem. Therefore, the Frame set (FS) of compounds used for the manifold construction must well cover a given chemical space.

View Article and Find Full Text PDF

The power of deep learning to ligand-based novel drug discovery.

Expert Opin Drug Discov

July 2020

Introduction: Deep discriminative and generative neural-network models are becoming an integral part of the modern approach to ligand-based novel drug discovery. The variety of different architectures of neural networks, the methods of their training, and the procedures of generating new molecules require expert knowledge to choose the most suitable approach.

Areas Covered: Three different approaches to deep learning use in ligand-based drug discovery are considered: virtual screening, neural generative models, and mutation-based structure generation.

View Article and Find Full Text PDF

Application of the mol2vec Technology to Large-size Data Visualization and Analysis.

Shojiro Shibayama Gilles Marcou Dragos Horvath Igor I Baskin Kimito Funatsu

Mol Inform

June 2020

Generative Topographic Mapping (GTM) is a dimensionality reduction method, which is widely used for both data visualization and structure-activity modeling. Large dimensionality of the initial data space may require significant computational resources and slow down the GTM construction. Therefore, it may be meaningful to reduce the number of descriptors used for encoding molecular structures.

View Article and Find Full Text PDF

Conjugated Quantitative Structure-Property Relationship Models: Application to Simultaneous Prediction of Tautomeric Equilibrium Constants and Acidity of Molecules.

Dmitry V Zankov Timur I Madzhidov Assima Rakhimbekova Timur R Gimadiev Ramil I Nugmanov

J Chem Inf Model

November 2019

Here, we describe a concept of conjugated models for several properties (activities) linked by a strict mathematical relationship. This relationship can be directly integrated analytically into the ridge regression (RR) algorithm or accounted for in a special case of "twin" neural networks (NN). Developed approaches were applied to the modeling of the logarithm of the prototropic tautomeric constant (logK) which can be expressed as the difference between the acidity constants (pKa) of two related tautomers.

View Article and Find Full Text PDF

Continuous molecular fields and the concept of molecular co-fields in structure-activity studies.

Igor I Baskin Nelly I Zhokhova

Future Med Chem

October 2019

The analysis of information on the spatial structure of molecules and the physical fields of their interactions with biological targets is extremely important for solving various problems in drug discovery. This mini-review article surveys the main features of the continuous molecular fields approach and its use for analyzing structure-activity relationships in 3D space, building 3D quantitative structure-activity models and conducting similarity based virtual screening. Particular attention is paid to the consideration of the concept of molecular co-fields and their use for the interpretation of 3D structure-activity models.

View Article and Find Full Text PDF

Is one-shot learning a viable option in drug discovery?

Expert Opin Drug Discov

July 2019

View Article and Find Full Text PDF

De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping.

Boris Sattarov Igor I Baskin Dragos Horvath Gilles Marcou Esben Jannik Bjerrum

J Chem Inf Model

March 2019

Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications.

View Article and Find Full Text PDF

Visualization and Analysis of Complex Reaction Data: The Case of Tautomeric Equilibria.

Marta Glavatskikh Timur Madzhidov Igor I Baskin Dragos Horvath Ramil Nugmanov

Mol Inform

September 2018

Generative Topographic Mapping (GTM) approach was successfully used to visualize, analyze and model the equilibrium constants (K ) of tautomeric transformations as a function of both structure and experimental conditions. The modeling set contained 695 entries corresponding to 350 unique transformations of 10 tautomeric types, for which K values were measured in different solvents and at different temperatures. Two types of GTM-based classification models were trained: first, a "structural" approach focused on separating tautomeric classes, irrespective of reaction conditions, then a "general" approach accounting for both structure and conditions.

View Article and Find Full Text PDF

Machine Learning Methods in Computational Toxicology.

Methods Mol Biol

February 2019

Various methods of machine learning, supervised and unsupervised, linear and nonlinear, classification and regression, in combination with various types of molecular descriptors, both "handcrafted" and "data-driven," are considered in the context of their use in computational toxicology. The use of multiple linear regression, variants of naïve Bayes classifier, k-nearest neighbors, support vector machine, decision trees, ensemble learning, random forest, several types of neural networks, and deep learning is the focus of attention of this review. The role of fragment descriptors, graph mining, and graph kernels is highlighted.

View Article and Find Full Text PDF

Assessment of tautomer distribution using the condensed reaction graph approach.

T R Gimadiev T I Madzhidov R I Nugmanov I I Baskin I S Antipin

J Comput Aided Mol Des

March 2018

We report the first direct QSPR modeling of equilibrium constants of tautomeric transformations (logK ) in different solvents and at different temperatures, which do not require intermediate assessment of acidity (basicity) constants for all tautomeric forms. The key step of the modeling consisted in the merging of two tautomers in one sole molecular graph ("condensed reaction graph") which enables to compute molecular descriptors characterizing entire equilibrium. The support vector regression method was used to build the models.

View Article and Find Full Text PDF

Predictive cartography of metal binders using generative topographic mapping.

Igor I Baskin Vitaly P Solov'ev Alexander A Bagatur'yants Alexandre Varnek

J Comput Aided Mol Des

August 2017

Generative topographic mapping (GTM) approach is used to visualize the chemical space of organic molecules (L) with respect to binding a wide range of 41 different metal cations (M) and also to build predictive models for stability constants (logK) of 1:1 (M:L) complexes using "density maps," "activity landscapes," and "selectivity landscapes" techniques. A two-dimensional map describing the entire set of 2962 metal binders reveals the selectivity and promiscuity zones with respect to individual metals or groups of metals with similar chemical properties (lanthanides, transition metals, etc). The GTM-based global (for entire set) and local (for selected subsets) models demonstrate a good predictive performance in the cross-validation procedure.

View Article and Find Full Text PDF

Energy-based Neural Networks as a Tool for Harmony-based Virtual Screening.

Nelly I Zhokhova Igor I Baskin

Mol Inform

November 2017

In Energy-Based Neural Networks (EBNNs), relationships between variables are captured by means of a scalar function conventionally called "energy". In this article, we introduce a procedure of "harmony search", which looks for compounds providing the lowest energies for the EBNNs trained on active compounds. It can be considered as a special kind of similarity search that takes into account regularities in the structures of active compounds.

View Article and Find Full Text PDF