Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment.

J Phys Chem A

Charles D. Davidson School of Chemical Engineering , 480 Stadium Mall Drive , Purdue University, West Lafayette , Indiana 47906 , United States.

Published: May 2019

Modern machine learning provides promising methods for accelerating the discovery and characterization of novel chemical species. However, in many areas experimental data remain costly and scarce, and computational models are unavailable for targeted figures of merit. Here we report a promising pathway to address this challenge by using chemical latent space enrichment, whereby disparate data sources are combined in joint prediction tasks to enable improved prediction in data-scarce applications. The approach is demonstrated for p K prediction of moderately sized molecular species using a combination of experimentally available p K data and density functional theory-based characterizations of the (de)protonation free energy. A novel autoencoder framework is used to create a continuous chemical latent space that is then used in single and joint training tasks for property prediction. By combining these two data sets in a jointly trained autoencoder framework, we observe mutual improvement in property prediction tasks in the scarce data limit. We also demonstrate an enrichment mechanism that is unique to latent space training, whereby training on excess computational data can mitigate the prediction losses associated with scarce experimental data and advantageously organize the latent space. These results demonstrate that disparate chemical data sources can be advantageously combined in an autoencoder framework with potential general application to data-scarce chemical learning tasks.

Download full-text PDF	Source
http://dx.doi.org/10.1021/acs.jpca.9b01398	DOI Listing

Publication Analysis

Top Keywords

latent space

autoencoder framework

data

scarce data

data sets

space enrichment

experimental data

chemical latent

data sources

prediction tasks

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!