Publications by authors named "Igor V Tetko"

Hyperparameter optimization is very frequently employed in machine learning. However, an optimization of a large space of parameters could result in overfitting of models. In recent studies on solubility prediction the authors collected seven thermodynamic and kinetic solubility datasets from different data sources.

View Article and Find Full Text PDF

Plasma protein binding (PPB) is closely related to pharmacokinetics, pharmacodynamics and drug toxicity. Existing models for predicting PPB often suffer from low prediction accuracy and poor interpretability, especially for high PPB compounds, and are most often not experimentally validated. Here, we carried out a strict data curation protocol, and applied consensus modeling to obtain a model with a coefficient of determination of 0.

View Article and Find Full Text PDF

Stakeholders of machine learning models desire explainable artificial intelligence (XAI) to produce human-understandable and consistent interpretations. In computational toxicity, augmentation of text-based molecular representations has been used successfully for transfer learning on downstream tasks. Augmentations of molecular representations can also be used at inference to compare differences between multiple representations of the same ground-truth.

View Article and Find Full Text PDF

The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.

View Article and Find Full Text PDF

Machine Learning (ML) techniques face significant challenges when predicting advanced chemical properties, such as yield, feasibility of chemical synthesis, and optimal reaction conditions. These challenges stem from the high-dimensional nature of the prediction task and the myriad essential variables involved, ranging from reactants and reagents to catalysts, temperature, and purification processes. Successfully developing a reliable predictive model not only holds the potential for optimizing high-throughput experiments but can also elevate existing retrosynthetic predictive approaches and bolster a plethora of applications within the field.

View Article and Find Full Text PDF
Article Synopsis
  • Prokineticins (PKs) are small peptides discovered two decades ago, now recognized for their various roles as angiogenic, anorectic, proinflammatory agents, and their involvement in different diseases, making them potential biomarkers.
  • PKs interact with two receptors, PKR1 and PKR2, which have unique therapeutic potentials in treating conditions like cardiovascular, metabolic, neural diseases, pain, and cancer.
  • The article reviews the functions of the PK family, highlights knowledge gaps about their receptors and ligands, and suggests that understanding these proteins better could lead to new, safe treatment strategies.
View Article and Find Full Text PDF

Introduction: Collaborative computing has attracted great interest in the possibility of joining the efforts of researchers worldwide. Its relevance has further increased during the pandemic crisis since it allows for the strengthening of scientific collaborations while avoiding physical interactions. Thus, the E4C consortium presents the MEDIATE initiative which invited researchers to contribute via their virtual screening simulations that will be combined with AI-based consensus approaches to provide robust and method-independent predictions.

View Article and Find Full Text PDF

A previously developed model to predict antibacterial activity of ionic liquids against a resistant strain was used to assess activity of phosphonium ionic liquids. Their antioxidant potential was additionally evaluated with newly developed models, which were based on public data. The accuracy of the models was rigorously evaluated using cross-validation as well as test set prediction.

View Article and Find Full Text PDF

The development of new functional materials based on porphyrins requires fast and accurate prediction of their spectral properties. The available models in the literature for absorption wavelength and extinction coefficient of the Soret band have low accuracy for this class of compounds. We collected spectral data for porphyrins to extend the literature set and compared the performance of global and local models for their modelling using different machine learning methods.

View Article and Find Full Text PDF

A possibility to accurately predict the absorption maximum wavelength of BODIPYs was investigated. We found that previously reported models had a low accuracy (40-57 nm) to predict BODIPYs due to the limited dataset sizes and/or number of BODIPYs (few hundreds). New models developed in this study were based on data of 6000-plus fluorescent dyes (including 4000-plus BODIPYs) and the deep neural network architecture.

View Article and Find Full Text PDF

AlphaScreen is one of the most widely used assay technologies in drug discovery due to its versatility, dynamic range and sensitivity. However, a presence of false positives and frequent hitters contributes to difficulties with an interpretation of measured HTS data. Although filters do exist to identify frequent hitters for AlphaScreen, they are frequently based on privileged scaffolds.

View Article and Find Full Text PDF

S. aureus resistant to methicillin (MRSA) is one of the most-concerned multidrug resistant bacteria, due to its role in life-threatening infections. There is an urgent need to develop new antibiotics against MRSA.

View Article and Find Full Text PDF

Background: Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditional rodent acute toxicity tests.

View Article and Find Full Text PDF

Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted over 5000 models for a Tox21 bioassay data set of 65 assays and ∼7600 compounds.

View Article and Find Full Text PDF

We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus.

View Article and Find Full Text PDF

Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep generative networks for SMILES generation.

View Article and Find Full Text PDF

Online Chemical Modeling Environment (OCHEM) was used for QSAR analysis of a set of ionic liquids (ILs) tested against multi-drug resistant (MDR) clinical isolate and strains. The predictive accuracy of regression models has coefficient of determination q = 0.66 - 0.

View Article and Find Full Text PDF

The increasing volume of biomedical data in chemistry and life sciences requires development of new methods and approaches for their analysis. Artificial Intelligence and machine learning, especially neural networks, are increasingly used in the chemical industry, in particular with respect to Big Data. This editorial highlights the main results presented during the special session of the International Conference on Neural Networks organized by "Big Data in Chemistry" project and draws perspectives on the future progress of the field.

View Article and Find Full Text PDF

We investigated the effect of different training scenarios on predicting the (retro)synthesis of chemical compounds using text-like representation of chemical reactions (SMILES) and Natural Language Processing (NLP) neural network Transformer architecture. We showed that data augmentation, which is a powerful method used in image processing, eliminated the effect of data memorization by neural networks and improved their performance for prediction of new sequences. This effect was observed when augmentation was used simultaneously for input and the target data simultaneously.

View Article and Find Full Text PDF