k-Means NANI: an improved clustering algorithm for Molecular Dynamics simulations.

Lexin Chen Daniel R Roe Matthew Kochert Carlos Simmerling Ramón Alain Miranda-Quintana

bioRxiv

Department of Chemistry, University of Florida, FL, USA.

Published: March 2024

One of the key challenges of -means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such as -means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. However, with high-dimensional and complex datasets such as those obtained from molecular simulation, -means++ fails to partition the data in an optimal manner. Furthermore, stochastic elements in all flavors of -means++ will lead to a lack of reproducibility. -means -Ary Natural Initiation (NANI) is presented as an alternative to tackle this challenge by using efficient -ary comparisons to both identify high-density regions in the data and select a diverse set of initial conformations. Centroids generated from NANI are not only representative of the data and different from one another, helping -means to partition the data accurately, but also deterministic, providing consistent cluster populations across replicates. From peptide and protein folding molecular simulations, NANI was able to create compact and well-separated clusters as well as accurately find the metastable states that agree with the literature. NANI can cluster diverse datasets and be used as a standalone tool or as part of our MDANCE clustering package.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10942464	PMC
http://dx.doi.org/10.1101/2024.03.07.583975	DOI Listing

Publication Analysis

Top Keywords

partition data

k-means nani

nani improved

clustering

improved clustering

clustering algorithm

algorithm molecular

molecular dynamics

dynamics simulations

simulations key

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!