Explainable Graph Neural Networks with Data Augmentation for Predicting p of C-H Acids.

J Chem Inf Model

Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China.

Published: April 2024

The p of C-H acids is an important parameter in the fields of organic synthesis, drug discovery, and materials science. However, the prediction of p is still a great challenge due to the limit of experimental data and the lack of chemical insight. Here, a new model for predicting the p values of C-H acids is proposed on the basis of graph neural networks (GNNs) and data augmentation. A message passing unit (MPU) was used to extract the topological and target-related information from the molecular graph data, and a readout layer was utilized to retrieve the information on the ionization site C atom. The retrieved information then was adopted to predict p by a fully connected network. Furthermore, to increase the diversity of the training data, a knowledge-infused data augmentation technique was established by replacing the H atoms in a molecule with substituents exhibiting different electronic effects. The MPU was pretrained with the augmented data. The efficacy of data augmentation was confirmed by visualizing the distribution of compounds with different substituents and by classifying compounds. The explainability of the model was studied by examining the change of p values when a specific atom was masked. This explainability was used to identify the key substituents for p. The model was evaluated on two data sets from the BonD database. Dataset1 includes the experimental p values of C-H acids measured in DMSO, while dataset2 comprises the p values measured in water. The results show that the knowledge-infused data augmentation technique greatly improves the predictive accuracy of the model, especially when the number of samples is small.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.3c00958DOI Listing

Publication Analysis

Top Keywords

data augmentation
20
c-h acids
16
data
10
graph neural
8
neural networks
8
values c-h
8
knowledge-infused data
8
augmentation technique
8
augmentation
5
explainable graph
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!