Molecular Property Prediction and Molecular Design Using a Supervised Grammar Variational Autoencoder.

J Chem Inf Model

Institute of Science and Technology, Federal University of São Paulo, 12247-014, São José dos Campos, SP, Brazil.

Published: February 2022

AI Article Synopsis

  • Machine learning (ML) algorithms are commonly used to predict molecular properties and design new molecules with specific characteristics, often requiring distinct approaches for each task.
  • The authors introduce a unified model, the supervised GVAE (SGVAE), which integrates molecular property information into the training process, allowing it to predict properties and generate unique molecules.
  • Results demonstrate that the SGVAE model can accurately predict molecular properties and successfully create new molecules with targeted qualities, achieving performance levels close to chemical accuracy, and outperforming traditional ML models that focus solely on property prediction.

Article Abstract

Some of the most common applications of machine learning (ML) algorithms dealing with small molecules usually fall within two distinct domains, namely, the prediction of molecular properties and the design of novel molecules with some desirable property. Here we unite these applications under a single molecular representation and ML algorithm by modifying the grammar variational autoencoder (GVAE) model with the incorporation of property information into its training procedure, thus creating a supervised GVAE (SGVAE). Results indicate that the biased latent space generated by this approach can successfully be used to predict the molecular properties of the input molecules, produce novel and unique molecules with some desired property and also estimate the properties of random sampled molecules. We illustrate these possibilities by sampling novel molecules from the latent space with specific values of the lowest unoccupied molecular orbital (LUMO) energy after training the model using the QM9 data set. Furthermore, the trained model is also used to predict the properties of a hold-out set and the resulting mean absolute error (MAE) shows values close to chemical accuracy for the dipole moment and atomization energies, even outperforming ML models designed to exclusive predict molecular properties using the SMILES as molecular representation. Therefore, these results show that the proposed approach is a viable way to provide generative ML models with molecular property information in a way that the generation of novel molecules is likely to achieve better results, with the benefit that these new molecules can also have their molecular properties accurately predicted.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.1c01573DOI Listing

Publication Analysis

Top Keywords

molecular properties
16
novel molecules
12
molecular
10
molecular property
8
prediction molecular
8
grammar variational
8
variational autoencoder
8
molecules
8
molecular representation
8
latent space
8

Similar Publications

Menthol is a naturally occurring cyclic terpene alcohol and is the major component of peppermint and corn mint essential oils extracted from Mentha piperita L. and Mentha arvensis L..

View Article and Find Full Text PDF

Hierarchical Porous Aggregate-Enabled Chromatography-Inspired Single-Sensor E-Nose for Volatile Monitoring.

ACS Sens

January 2025

School of Chemistry and Molecular Engineering, In Situ Devices Research Center, Shanghai Key Laboratory for Urban Ecological Processes and Eco-Restoration, East China Normal University, Shanghai 200241, China.

Monitoring volatile organic compounds (VOCs) is crucial for ensuring safety and health. In this study, we introduce a strategy to engineer a chromatography-inspired single-sensor (CISS) e-nose tailored for VOC monitoring. This approach overcomes the limitations of traditional methodologies and conventional e-noses.

View Article and Find Full Text PDF

This study presents T-1-NBAB, a new compound derived from the natural xanthine alkaloid theobromine, aimed at inhibiting VEGFR-2, a crucial protein in angiogenesis. T-1-NBAB's potential to interacts with and inhibit the VEGFR-2 was indicated using in silico techniques like molecular docking, MD simulations, MM-GBSA, PLIP, essential dynamics, and bi-dimensional projection experiments. DFT experiments was utilized also to study the structural and electrostatic properties of T-1-NBAB.

View Article and Find Full Text PDF

Pancreatic Ductal Adenocarcinoma (PDAC) is a devastating disease with poor clinical outcomes, which is mainly because of delayed disease detection, resistance to chemotherapy, and lack of specific targeted therapies. The disease's development involves complex interactions among immunological, genetic, and environmental factors, yet its molecular mechanism remains elusive. A major challenge in understanding PDAC etiology lies in unraveling the genetic profiling that governs the PDAC network.

View Article and Find Full Text PDF

The aging population necessitates a critical need for medical devices, where polymers-based surface lubrication coating is essential for optimal functionality. In fact, lubrication and mechanical requirements vary depending on the service environment of different medical devices. Until now, key mean is still blank for general preparation of hydrophilic polymers-based lubrication coatings with on-demand mechanics and lubricity.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!