From Alpha to Zeta: Identifying Variants and Subtypes of SARS-CoV-2 Via Clustering.

Andrew Melnyk Fatemeh Mohebbi Sergey Knyazev Bikram Sahoo Roya Hosseini Pavel Skums Alex Zelikovsky Murray Patterson

J Comput Biol

Department of Computer Science, Georgia State University, Atlanta, Georgia, USA.

Published: November 2021

The availability of millions of SARS-CoV-2 (Severe Acute Respiratory Syndrome-Coronavirus-2) sequences in public databases such as GISAID (Global Initiative on Sharing All Influenza Data) and EMBL-EBI (European Molecular Biology Laboratory-European Bioinformatics Institute) (the United Kingdom) allows a detailed study of the evolution, genomic diversity, and dynamics of a virus such as never before. Here, we identify novel variants and subtypes of SARS-CoV-2 by clustering sequences in adapting methods originally designed for haplotyping intrahost viral populations. We asses our results using clustering entropy-the first time it has been used in this context. Our clustering approach reaches lower entropies compared with other methods, and we are able to boost this even further through gap filling and Monte Carlo-based entropy minimization. Moreover, our method clearly identifies the well-known Alpha variant in the U.K. and GISAID data sets, and is also able to detect the much less represented (<1% of the sequences) Beta (South Africa), Epsilon (California), and Gamma and Zeta (Brazil) variants in the GISAID data set. Finally, we show that each variant identified has high selective fitness, based on the growth rate of its cluster over time. This demonstrates that our clustering approach is a viable alternative for detecting even rare subtypes in very large data sets.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8819513	PMC
http://dx.doi.org/10.1089/cmb.2021.0302	DOI Listing

Publication Analysis

Top Keywords

variants subtypes

subtypes sars-cov-2

sars-cov-2 clustering

alpha zeta

zeta identifying

identifying variants

clustering

clustering availability

availability millions

millions sars-cov-2

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!