Similarity Studies of Corona Viruses through Chaos Game Representation.

Comput Mol Biosci

Department Natural Sciences, Elizabeth City State University, Elizabeth City, North Carolina, USA.

Published: September 2020

The novel coronavirus (SARS-COV-2) is generally referred to as Covid-19 virus has spread to 213 countries with nearly 7 million confirmed cases and nearly 400,000 deaths. Such major outbreaks demand classification and origin of the virus genomic sequence, for planning, containment, and treatment. Motivated by the above need, we report two alignment-free methods combing with CGR to perform clustering analysis and create a phylogenetic tree based on it. To each DNA sequence we associate a matrix then define distance between two DNA sequences to be the distance between their associated matrix. These methods are being used for phylogenetic analysis of coronavirus sequences. Our approach provides a powerful tool for analyzing and annotating genomes and their phylogenetic relationships. We also compare our tool to ClustalX algorithm which is one of the most popular alignment methods. Our alignment-free methods are shown to be capable of finding closest genetic relatives of coronaviruses.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7497811PMC
http://dx.doi.org/10.4236/cmb.2020.103004DOI Listing

Publication Analysis

Top Keywords

alignment-free methods
8
similarity studies
4
studies corona
4
corona viruses
4
viruses chaos
4
chaos game
4
game representation
4
representation novel
4
novel coronavirus
4
coronavirus sars-cov-2
4

Similar Publications

STAIG: Spatial transcriptomics analysis via image-aided graph contrastive learning for domain exploration and alignment-free integration.

Nat Commun

January 2025

Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, the University of Tokyo, Tokyo, Japan.

Spatial transcriptomics is an essential application for investigating cellular structures and interactions and requires multimodal information to precisely study spatial domains. Here, we propose STAIG, a deep-learning model that integrates gene expression, spatial coordinates, and histological images using graph-contrastive learning coupled with high-performance feature extraction. STAIG can integrate tissue slices without prealignment and remove batch effects.

View Article and Find Full Text PDF

Alignment-Free Viral Sequence Classification at Scale.

bioRxiv

December 2024

Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa.

Background: The rapid increase in nucleotide sequence data generated by next-generation sequencing (NGS) technologies demands efficient computational tools for sequence comparison. Alignment-based methods, such as BLAST, are increasingly overwhelmed by the scale of contemporary datasets due to their high computational demands for classification. This study evaluates alignment-free (AF) methods as scalable and rapid alternatives for viral sequence classification, focusing on identifying techniques that maintain high accuracy and efficiency when applied to extremely large datasets.

View Article and Find Full Text PDF

Background: Traditional supervised learning methods applied to DNA sequence taxonomic classification rely on the labor-intensive and time-consuming step of labelling the primary DNA sequences. Additionally, standard DNA classification/clustering methods involve time-intensive multiple sequence alignments, which impacts their applicability to large genomic datasets or distantly related organisms. These limitations indicate a need for robust, efficient, and scalable unsupervised DNA sequence clustering methods that do not depend on sequence labels or alignment.

View Article and Find Full Text PDF

Methods for rapidly inferring the evolutionary history of species or populations with genome-wide data are progressing, but computational constraints still limit our abilities in this area. We developed an alignment-free method to infer genome-wide phylogenies and implemented it in the Python package T opic C ontml . The method uses probabilistic topic modeling (specifically, Latent Dirichlet Allocation or LDA) to extract 'topic' frequencies from -mers, which are derived from multilocus DNA sequences.

View Article and Find Full Text PDF

Background: Genomic sequence similarity comparison is a crucial research area in bioinformatics. Multiple Sequence Alignment (MSA) is the basic technique used to identify regions of similarity between sequences, although MSA tools are widely used and highly accurate, they are often limited by computational complexity, and inaccuracies when handling highly divergent sequences, which leads to the development of alignment-free (AF) algorithms.

Results: This paper presents TreeWave, a novel AF approach based on frequency chaos game representation and discrete wavelet transform of sequences for phylogeny inference.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!