The world is grappling with the COVID-19 pandemic caused by the 2019 novel SARS-CoV-2. To better understand this novel virus and its relationship with other pathogens, new methods for analyzing the genome are required. In this study, intrinsic dinucleotide genomic signatures were analyzed for whole genome sequence data of eight pathogenic species, including SARS-CoV-2. The genome sequences were transformed into dinucleotide relative frequencies and classified using the extreme gradient boosting (XGBoost) model. The classification models were trained to a) distinguish between the sequences of all eight species and b) distinguish between sequences of SARS-CoV-2 that originate from different geographic regions. Our method attained 100% in all performance metrics and for all tasks in the eight-species classification problem. Moreover, the models achieved 67% balanced accuracy for the task of classifying the SARS-CoV-2 sequences into the six continental regions and achieved 86% balanced accuracy for the task of classifying SARS-CoV-2 samples as either originating from Asia or not. Analysis of the dinucleotide genomic profiles of the eight species revealed a similarity between the SARS-CoV-2 and MERS-CoV viral sequences. Further analysis of SARS-CoV-2 viral sequences from the six continents revealed that samples from Oceania had the highest frequency of TT dinucleotides as well as the lowest CG frequency compared to the other continents. The dinucleotide signatures of AC, AG,CA, CT, GA, GT, TC, and TG were well conserved across most genomes, while the frequencies of other dinucleotide signatures varied considerably. Altogether, the results from this study demonstrate the utility of dinucleotide relative frequencies for discriminating and identifying similar species.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8675546PMC
http://dx.doi.org/10.1109/ACCESS.2020.3031387DOI Listing

Publication Analysis

Top Keywords

dinucleotide genomic
8
dinucleotide relative
8
relative frequencies
8
distinguish sequences
8
balanced accuracy
8
accuracy task
8
task classifying
8
classifying sars-cov-2
8
viral sequences
8
dinucleotide signatures
8

Similar Publications

Gap junction intercellular communications regulates activation of SARM1 and protects against axonal degeneration.

Cell Death Dis

January 2025

State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen, 518055, China.

Sterile alpha and Toll/interleukin-1 receptor motif containing 1 (SARM1), a nicotinamide adenine dinucleotide (NAD)-utilizing enzyme, mediates axon degeneration (AxD) in various neurodegenerative diseases. It is activated by nicotinamide mononucleotide (NMN) to produce a calcium messenger, cyclic ADP-ribose (cADPR). This activity is blocked by elevated NAD level.

View Article and Find Full Text PDF

Eugenol, a phenolic natural product with diverse pharmacological activities, remains unexplored in liver cancer. Using network pharmacology, we investigated eugenol's therapeutic mechanisms in liver cancer. We obtained eugenol's molecular structure from PubChem and screened its targets using similarity ensemble approach in Swiss Target Predictiondatabases.

View Article and Find Full Text PDF

Dinucleases of the DEDD superfamily, such as oligoribonuclease, Rexo2 and nanoRNase C, catalyze the essential final step of RNA degradation, the conversion of di- to mononucleotides. The active sites of these enzymes are optimized for substrates that are two nucleotides long, and do not discriminate between RNA and DNA. Here, we identified a novel DEDD subfamily, members of which function as dedicated deoxydinucleases (diDNases) that specifically hydrolyze single-stranded DNA dinucleotides in a sequence-independent manner.

View Article and Find Full Text PDF

Genome-Wide Microsatellites in : Development, Distribution, Characterization, and Polymorphism.

Animals (Basel)

December 2024

Guangdong Provincial Key Laboratory of Fishery Ecology and Environment, South China Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510300, China.

The yellowfin seabream () is an economically important commercial mariculture fish in China and Southeast Asia. Only a few simple sequence repeats (SSRs) of have been isolated and reported, which has hindered breeding progress. A total of 318,862 SSRs were isolated and characterized from the genome in this study.

View Article and Find Full Text PDF

Genome-Wide In Silico Analysis of Microsatellite Loci in Rabbits.

Animals (Basel)

December 2024

Department of Poultry Breeding, Animal Production Research Institute, Agriculture Research Center, Dokki, Giza 12618, Egypt.

This study aimed to characterize microsatellites in the rabbit genome using an in silico approach and to develop and validate microsatellite markers. Blood samples were collected from 15 Baladi rabbits and 18 New Zealand White (NZW rabbits). The GMATA software was used to define SSRs in the extracted sequences.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!