Publications by authors named "Baskin I"

This paper reviews the application of machine learning to the inhibition of corrosion by organic molecules. The methodologies considered include quantitative structure-property relationships (QSPR) and related data-driven approaches. The characteristic features of their key components are considered as applied to corrosion inhibition, including datasets, response properties, molecular descriptors, machine learning methods, and structure-property models.

View Article and Find Full Text PDF

Conjugated QSPR models for reactions integrate fundamental chemical laws expressed by mathematical equations with machine learning algorithms. Herein we present a methodology for building conjugated QSPR models integrated with the Arrhenius equation. Conjugated QSPR models were used to predict kinetic characteristics of cycloaddition reactions related by the Arrhenius equation: rate constant , pre-exponential factor , and activation energy .

View Article and Find Full Text PDF

In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL).

View Article and Find Full Text PDF

Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance.

View Article and Find Full Text PDF

The most widely used QSAR approaches are mainly based on 2D molecular representation which ignores stereoconfiguration and conformational flexibility of compounds. 3D QSAR uses a single conformer of each compound which is difficult to choose reasonably. 4D QSAR uses multiple conformers to overcome the issues of 2D and 3D methods.

View Article and Find Full Text PDF

The global lockdown to mitigate COVID-19 pandemic health risks has altered human interactions with nature. Here, we report immediate impacts of changes in human activities on wildlife and environmental threats during the early lockdown months of 2020, based on 877 qualitative reports and 332 quantitative assessments from 89 different studies. Hundreds of reports of unusual species observations from around the world suggest that animals quickly responded to the reductions in human presence.

View Article and Find Full Text PDF

In this article, we consider cross-validation of the quantitative structure-property relationship models for reactions and show that the conventional k-fold cross-validation (CV) procedure gives an 'optimistically' biased assessment of prediction performance. To address this issue, we suggest two strategies of model cross-validation, 'transformation-out' CV, and 'solvent-out' CV. Unlike the conventional k-fold cross-validation approach that does not consider the nature of objects, the proposed procedures provide an unbiased estimation of the predictive performance of the models for novel types of structural transformations in chemical reactions and reactions going under new conditions.

View Article and Find Full Text PDF

The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class.

View Article and Find Full Text PDF

Nowadays, the problem of the model's applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions.

View Article and Find Full Text PDF

Here we report a new predictive model for autoignition temperature (AIT), an important physical parameter widely used to assess potential safety hazards of combustible materials. Available structure-AIT data extracted from different sources were critically analysed. Support vector regression (SVR) models on different data subsets were built in order to identify a reliable compound set on which a realistic model could be built.

View Article and Find Full Text PDF

Correction for 'QSAR without borders' by Eugene N. Muratov et al., Chem.

View Article and Find Full Text PDF

Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics.

View Article and Find Full Text PDF

Generative Topographic Mapping (GTM) can be efficiently used to visualize, analyze and model large chemical data. The GTM manifold needs to span the chemical space deemed relevant for a given problem. Therefore, the Frame set (FS) of compounds used for the manifold construction must well cover a given chemical space.

View Article and Find Full Text PDF

Introduction: Deep discriminative and generative neural-network models are becoming an integral part of the modern approach to ligand-based novel drug discovery. The variety of different architectures of neural networks, the methods of their training, and the procedures of generating new molecules require expert knowledge to choose the most suitable approach.

Areas Covered: Three different approaches to deep learning use in ligand-based drug discovery are considered: virtual screening, neural generative models, and mutation-based structure generation.

View Article and Find Full Text PDF

Generative Topographic Mapping (GTM) is a dimensionality reduction method, which is widely used for both data visualization and structure-activity modeling. Large dimensionality of the initial data space may require significant computational resources and slow down the GTM construction. Therefore, it may be meaningful to reduce the number of descriptors used for encoding molecular structures.

View Article and Find Full Text PDF

Here, we describe a concept of conjugated models for several properties (activities) linked by a strict mathematical relationship. This relationship can be directly integrated analytically into the ridge regression (RR) algorithm or accounted for in a special case of "twin" neural networks (NN). Developed approaches were applied to the modeling of the logarithm of the prototropic tautomeric constant (logK) which can be expressed as the difference between the acidity constants (pKa) of two related tautomers.

View Article and Find Full Text PDF

The analysis of information on the spatial structure of molecules and the physical fields of their interactions with biological targets is extremely important for solving various problems in drug discovery. This mini-review article surveys the main features of the continuous molecular fields approach and its use for analyzing structure-activity relationships in 3D space, building 3D quantitative structure-activity models and conducting similarity based virtual screening. Particular attention is paid to the consideration of the concept of molecular co-fields and their use for the interpretation of 3D structure-activity models.

View Article and Find Full Text PDF

Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications.

View Article and Find Full Text PDF

Generative Topographic Mapping (GTM) approach was successfully used to visualize, analyze and model the equilibrium constants (K ) of tautomeric transformations as a function of both structure and experimental conditions. The modeling set contained 695 entries corresponding to 350 unique transformations of 10 tautomeric types, for which K values were measured in different solvents and at different temperatures. Two types of GTM-based classification models were trained: first, a "structural" approach focused on separating tautomeric classes, irrespective of reaction conditions, then a "general" approach accounting for both structure and conditions.

View Article and Find Full Text PDF

Various methods of machine learning, supervised and unsupervised, linear and nonlinear, classification and regression, in combination with various types of molecular descriptors, both "handcrafted" and "data-driven," are considered in the context of their use in computational toxicology. The use of multiple linear regression, variants of naïve Bayes classifier, k-nearest neighbors, support vector machine, decision trees, ensemble learning, random forest, several types of neural networks, and deep learning is the focus of attention of this review. The role of fragment descriptors, graph mining, and graph kernels is highlighted.

View Article and Find Full Text PDF

We report the first direct QSPR modeling of equilibrium constants of tautomeric transformations (logK ) in different solvents and at different temperatures, which do not require intermediate assessment of acidity (basicity) constants for all tautomeric forms. The key step of the modeling consisted in the merging of two tautomers in one sole molecular graph ("condensed reaction graph") which enables to compute molecular descriptors characterizing entire equilibrium. The support vector regression method was used to build the models.

View Article and Find Full Text PDF

Generative topographic mapping (GTM) approach is used to visualize the chemical space of organic molecules (L) with respect to binding a wide range of 41 different metal cations (M) and also to build predictive models for stability constants (logK) of 1:1 (M:L) complexes using "density maps," "activity landscapes," and "selectivity landscapes" techniques. A two-dimensional map describing the entire set of 2962 metal binders reveals the selectivity and promiscuity zones with respect to individual metals or groups of metals with similar chemical properties (lanthanides, transition metals, etc). The GTM-based global (for entire set) and local (for selected subsets) models demonstrate a good predictive performance in the cross-validation procedure.

View Article and Find Full Text PDF

In Energy-Based Neural Networks (EBNNs), relationships between variables are captured by means of a scalar function conventionally called "energy". In this article, we introduce a procedure of "harmony search", which looks for compounds providing the lowest energies for the EBNNs trained on active compounds. It can be considered as a special kind of similarity search that takes into account regularities in the structures of active compounds.

View Article and Find Full Text PDF