Simpler is Better: How Linear Prediction Tasks Improve Transfer Learning in Chemical Autoencoders.

J Phys Chem A

Charles D. Davidson School of Chemical Engineering, Purdue University, 480 Stadium Mall Drive, West Lafayette, Indiana 47906, United States.

Published: May 2020

Transfer learning is a subfield of machine learning that leverages proficiency in one or more prediction tasks to improve proficiency in a related task. For chemical property prediction, transfer learning models represent a promising approach for addressing the data scarcity limitations of many properties by utilizing potentially abundant data from one or more adjacent applications. Transfer learning models typically utilize a latent variable that is common to several prediction tasks and provides a mechanism for information exchange between tasks. For chemical applications, it is still largely unknown how correlation between the prediction tasks affects performance, the limitations on the number of tasks that can be simultaneously trained in these models before incurring performance degradation, and if transfer learning positively or negatively affects ancillary model properties. Here we investigate these questions using an autoencoder latent space as a latent variable for transfer learning models for predicting properties from the QM9 data set that have been supplemented with semiempirical quantum chemistry calculations. We demonstrate that property prediction can be counterintuitively improved by utilizing a simpler linear predictor model, which has the effect of forcing the latent space to organize linearly with respect to each property. In data scarce prediction tasks, the transfer learning improvement is dramatic, whereas in data rich prediction tasks, there appears to be little adverse impact of transfer learning on prediction performance. The transfer learning approach demonstrated here thus represents a highly advantageous supplement to property prediction models with no downside in implementation.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jpca.0c00042DOI Listing

Publication Analysis

Top Keywords

transfer learning
36
prediction tasks
24
property prediction
12
learning models
12
prediction
10
learning
10
transfer
9
tasks
8
tasks improve
8
latent variable
8

Similar Publications

The widespread use of pesticides, including diazinon, poses an increased risk of environmental pollution and detrimental effects on biodiversity, food security, and water resources. In this study, we investigated the impact of Potentially Toxic Elements (PTE) including Zn, Cd, V, and Mn on the degradation of diazinon in three different soils. We investigated the capability and performance of four machine learning models to predict residual pesticide concentration, including adaptive neuro-fuzzy inference system (ANFIS), support vector regression (SVR), radial basis function (RBF), and multi-layer perceptron (MLP).

View Article and Find Full Text PDF

Machine Learning Boosted Entropy-Engineered Synthesis of CuCo Nanometric Solid Solution Alloys for Near-100% Nitrate-to-Ammonia Selectivity.

ACS Appl Mater Interfaces

December 2024

Key Laboratory of Synthetic and Biological Colloids, Ministry of Education, School of Chemical and Material Engineering, Jiangnan University, 214122 Jiangsu, China.

Nanometric solid solution alloys are utilized in a broad range of fields, including catalysis, energy storage, medical application, and sensor technology. Unfortunately, the synthesis of these alloys becomes increasingly challenging as the disparity between the metal elements grows, due to differences in atomic sizes, melting points, and chemical affinities. This study utilized a data-driven approach incorporating sample balancing enhancement techniques and multilayer perceptron (MLP) algorithms to improve the model's ability to handle imbalanced data, significantly boosting the efficiency of experimental parameter optimization.

View Article and Find Full Text PDF

Objective: Early detection of surgical complications allows for timely therapy and proactive risk mitigation. Machine learning (ML) can be leveraged to identify and predict patient risks for postoperative complications. We developed and validated the effectiveness of predicting postoperative complications using a novel surgical Variational Autoencoder (surgVAE) that uncovers intrinsic patterns via cross-task and cross-cohort presentation learning.

View Article and Find Full Text PDF

Health care decisions are increasingly informed by clinical decision support algorithms, but these algorithms may perpetuate or increase racial and ethnic disparities in access to and quality of health care. Further complicating the problem, clinical data often have missing or poor quality racial and ethnic information, which can lead to misleading assessments of algorithmic bias. We present novel statistical methods that allow for the use of probabilities of racial/ethnic group membership in assessments of algorithm performance and quantify the statistical bias that results from error in these imputed group probabilities.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!