UMAP-assisted K-means clustering of large-scale SARS-CoV-2 mutation datasets.

Comput Biol Med

Department of Mathematics, Michigan State University, MI, 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, MI, 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, MI, 48824, USA. Electronic address:

Published: April 2021

Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a worldwide devastating effect. Understanding the evolution and transmission of SARS-CoV-2 is of paramount importance for controlling, combating and preventing COVID-19. Due to the rapid growth in both the number of SARS-CoV-2 genome sequences and the number of unique mutations, the phylogenetic analysis of SARS-CoV-2 genome isolates faces an emergent large-data challenge. We introduce a dimension-reduced K-means clustering strategy to tackle this challenge. We examine the performance and effectiveness of three dimension-reduction algorithms: principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). By using four benchmark datasets, we found that UMAP is the best-suited technique due to its stable, reliable, and efficient performance, its ability to improve clustering accuracy, especially for large Jaccard distanced-based datasets, and its superior clustering visualization. The UMAP-assisted K-means clustering enables us to shed light on increasingly large datasets from SARS-CoV-2 genome isolates.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7897976PMC
http://dx.doi.org/10.1016/j.compbiomed.2021.104264DOI Listing

Publication Analysis

Top Keywords

k-means clustering
12
sars-cov-2 genome
12
umap-assisted k-means
8
genome isolates
8
sars-cov-2
6
clustering
5
clustering large-scale
4
large-scale sars-cov-2
4
sars-cov-2 mutation
4
datasets
4

Similar Publications

Mapping the landscape of Hospital at home (HaH) care: a validated taxonomy for HaH care model classification.

BMC Health Serv Res

January 2025

Institute Patient-Centered Digital Health, Bern University of Applied Sciences, Quellgasse 21, Biel, 2502, Switzerland.

Background: Hospital at home (HaH) care models have gained significant attention due to their potential to reduce healthcare costs, improve patient satisfaction, and lower readmission rates. However, the lack of a standardized classification system has hindered systematic evaluation and comparison of these models. Taxonomies serve as classification systems that simplify complexity and enhance understanding within a specific domain.

View Article and Find Full Text PDF

Sparse kernel -means clustering.

J Appl Stat

June 2024

Graduate School, Department of Urban Big Data Convergence, University of Seoul, Seoul, South Korea.

Clustering is an essential technique that groups similar data points to uncover the underlying structure and features of the data. Although traditional clustering methods such as -means are widely utilized, they have limitations in identifying nonlinear clusters. Thus, alternative techniques, such as kernel -means and spectral clustering, have been developed to address this issue.

View Article and Find Full Text PDF

Variational graph autoencoder for reconstructed transcriptomic data associated with NLRP3 mediated pyroptosis in periodontitis.

Sci Rep

January 2025

Department of Basic Sciences, Faculty of Dentistry, Universidad de Antioquia U de A, Medellín, 050010, Colombia.

The NLRP3 inflammasome, regulated by TLR4, plays a pivotal role in periodontitis by mediating inflammatory cytokine release and bone loss induced by Porphyromonas gingivalis. Periodontal disease creates a hypoxic environment, favoring anaerobic bacteria survival and exacerbating inflammation. The NLRP3 inflammasome triggers pyroptosis, a programmed cell death that amplifies inflammation and tissue damage.

View Article and Find Full Text PDF

Static and dynamic connectivity structure of white-matter functional networks across the adult lifespan.

Prog Neuropsychopharmacol Biol Psychiatry

January 2025

MOE-LCSM, School of Mathematics and Statistics, Hunan Normal University, Changsha 410006, PR China; Key Laboratory of Applied Statistics and Data Science, Hunan Normal University, College of Hunan Province, Changsha 410006, PR China. Electronic address:

Aging of the human brain involves intricate biological processes, resulting in complex changes in structure and function. While the effects of aging on gray matter (GM) connectivity are extensively studied, white matter (WM) functional changes have received comparatively less attention. This study examines age-related WM functional dynamics using resting-state fMRI across the adult lifespan.

View Article and Find Full Text PDF

Molecular arrangement in the chiral smectic phases of the glassforming (S)-4'-(1-methylheptylcarbonyl)biphenyl-4-yl 4-[7-(2,2,3,3,4,4,4-heptafluorobutoxy) heptyl-1-oxy]benzoate is investigated by X-ray diffraction. An increased correlation length of the positional short-range order in the supercooled state agrees with the previous assumption of the hexatic smectic phase. However, the registered X-ray diffraction patterns are not typical for the hexatic phases.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!