Enhanced sampling of robust molecular datasets with uncertainty-based collective variables.

J Chem Phys

Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.

Published: January 2025

Generating a dataset that is representative of the accessible configuration space of a molecular system is crucial for the robustness of machine-learned interatomic potentials. However, the complexity of molecular systems, characterized by intricate potential energy surfaces, with numerous local minima and energy barriers, presents a significant challenge. Traditional methods of data generation, such as random sampling or exhaustive exploration, are either intractable or may not capture rare, but highly informative configurations. In this study, we propose a method that leverages uncertainty as the collective variable (CV) to guide the acquisition of chemically relevant data points, focusing on regions of configuration space where ML model predictions are most uncertain. This approach employs a Gaussian Mixture Model-based uncertainty metric from a single model as the CV for biased molecular dynamics simulations. The effectiveness of our approach in overcoming energy barriers and exploring unseen energy minima, thereby enhancing the dataset in an active learning framework, is demonstrated on alanine dipeptide and bulk silica.

Download full-text PDF

Source
http://dx.doi.org/10.1063/5.0246178DOI Listing

Publication Analysis

Top Keywords

configuration space
8
energy barriers
8
enhanced sampling
4
sampling robust
4
molecular
4
robust molecular
4
molecular datasets
4
datasets uncertainty-based
4
uncertainty-based collective
4
collective variables
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!