Tutorial on Molecular Latent Space Simulators (LSSs): Spatially and Temporally Continuous Data-Driven Surrogate Dynamical Models of Molecular Systems.

Michael S Jones Kirill Shmilovich Andrew L Ferguson

J Phys Chem A

Pritzker School of Molecular Engineering, The University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, United States.

Published: November 2024

The inherently serial nature and requirement for short integration time steps in the numerical integration of molecular dynamics (MD) calculations place strong limitations on the accessible simulation time scales and statistical uncertainties in sampling slowly relaxing dynamical modes and rare events. Molecular latent space simulators (LSSs) are a data-driven approach to learning a surrogate dynamical model of the molecular system from modest MD training trajectories that can generate synthetic trajectories at a fraction of the computational cost. The training data may comprise single long trajectories or multiple short, discontinuous trajectories collected over, for example, distributed computing resources. Provided the training data provide sufficient sampling of the relevant thermodynamic states and dynamical transitions to robustly learn the underlying microscopic propagator, an LSS furnishes a global model of the dynamics capable of producing temporally and spatially continuous molecular trajectories. Trained LSS models have produced simulation trajectories at up to 6 orders of magnitude lower cost than standard MD to enable dense sampling of molecular phase space and large reduction of the statistical errors in structural, thermodynamic, and kinetic observables. The LSS employs three deep learning architectures to solve three independent learning problems over the training data: (i) an encoding of the high-dimensional MD into a low-dimensional slow latent space using state-free reversible VAMPnets (SRVs), (ii) a propagator of the microscopic dynamics within the low-dimensional latent space using mixture density networks (MDNs), and (iii) a generative decoding of the low-dimensional latent coordinates back to the original high-dimensional molecular configuration space using conditional Wasserstein generative adversarial networks (cWGANs) or denoising diffusion probability models (DDPMs). In this software tutorial, we introduce the mathematical and numerical background and theory of LSS and present example applications of a user-friendly Python package software implementation to alanine dipeptide and a 28-residue beta-beta-alpha (BBA) protein within simple Google Colab notebooks.

Download full-text PDF	Source
http://dx.doi.org/10.1021/acs.jpca.4c05389	DOI Listing

Publication Analysis

Top Keywords

latent space

training data

molecular latent

space simulators

simulators lsss

surrogate dynamical

low-dimensional latent

molecular

space

trajectories

Similar Publications

Diagnosing Helicobacter pylori using autoencoders and limited annotations through anomalous staining patterns in IHC whole slide images.

Int J Comput Assist Radiol Surg

January 2025

Comp. Sci. Dep, Universitat Autònoma de Barcelona, Campus UAB, Cerdanyola del Vallès, 08193, Catalunya, Spain.

Pau Cano Eva Musulen Debora Gil

Purpose: This work addresses the detection of Helicobacter pylori (H. pylori) in histological images with immunohistochemical staining. This analysis is a time-demanding task, currently done by an expert pathologist that visually inspects the samples.

View Article and Find Full Text PDF

Similar Publications

Addressing scalability and managing sparsity and dropout events in single-cell representation identification with ZIGACL.

Brief Bioinform

November 2024

School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui, China.

Mingguang Shi Xuefeng Li

Despite significant advancements in single-cell representation learning, scalability and managing sparsity and dropout events continue to challenge the field as scRNA-seq datasets expand. While current computational tools struggle to maintain both efficiency and accuracy, the accurate connection of these dropout events to specific biological functions usually requires additional, complex experiments, often hampered by potential inaccuracies in cell-type annotation. To tackle these challenges, the Zero-Inflated Graph Attention Collaborative Learning (ZIGACL) method has been developed.

View Article and Find Full Text PDF

Similar Publications

Semi-supervised emotion-driven music generation model based on category-dispersed Gaussian Mixture Variational Autoencoders.

PLoS One

January 2025

Communication University of China, Nanjing, China.

Zihao Ning Xiao Han Jie Pan

Existing emotion-driven music generation models heavily rely on labeled data and lack interpretability and controllability of emotions. To address these limitations, a semi-supervised emotion-driven music generation model based on category-dispersed Gaussian mixture variational autoencoders is proposed. Initially, a controllable music generation model is introduced, which disentangles and manipulates rhythm and tonal features, enabling controlled music generation.

View Article and Find Full Text PDF

Similar Publications

Anomaly detection in virtual machine logs against irrelevant attribute interference.

PLoS One

January 2025

Shanghai Jiao Tong University, Shanghai, China.

Hao Zhang Yun Zhou Huahu Xu Jiangang Shi Xinhua Lin

Virtual machine logs are generated in large quantities. Virtual machine logs may contain some abnormal logs that indicate security risks or system failures of the virtual machine platform. Therefore, using unsupervised anomaly detection methods to identify abnormal logs is a meaningful task.

View Article and Find Full Text PDF

Similar Publications

Marked point process variational autoencoder with applications to unsorted spiking activities.

PLoS Comput Biol

December 2024

Communication Science Laboratories, NTT Corporation, Kyoto, Japan.

Ryohei Shibue Tomoharu Iwata

Spike train modeling across large neural populations is a powerful tool for understanding how neurons code information in a coordinated manner. Recent studies have employed marked point processes in neural population modeling. The marked point process is a stochastic process that generates a sequence of events with marks.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!