Virus discovery by genomics and metagenomics empowered studies of viromes, facilitated characterization of pathogen epidemiology, and redefined our understanding of the natural genetic diversity of viruses with profound functional and structural implications. Here we employed a data-driven virus discovery approach that directly queries unprocessed sequencing data in a highly parallelized way and involves a targeted viral genome assembly strategy in a wide range of sequence similarity. By screening more than 269,000 datasets of numerous authors from the Sequence Read Archive and using two metrics that quantitatively assess assembly quality, we discovered 40 nidoviruses from six virus families whose members infect vertebrate hosts. They form 13 and 32 putative viral subfamilies and genera, respectively, and include 11 coronaviruses with bisegmented genomes from fishes and amphibians, a giant 36.1 kilobase coronavirus genome with a duplicated spike glycoprotein (S) gene, 11 tobaniviruses and 17 additional corona-, arteri-, cremega-, nanhypo- and nangoshaviruses. Genome segmentation emerged in a single evolutionary event in the monophyletic lineage encompassing the subfamily Pitovirinae. We recovered the bisegmented genome sequences of two coronaviruses from RNA samples of 69 infected fishes and validated the presence of poly(A) tails at both segments using 3'RACE PCR and subsequent Sanger sequencing. We report a genetic linkage between accessory and structural proteins whose phylogenetic relationships and evolutionary distances are incongruent with the phylogeny of replicase proteins. We rationalize these observations in a model of inter-family S recombination involving at least five ancestral corona- and tobaniviruses of aquatic hosts. In support of this model, we describe an individual fish co-infected with members from the families Coronaviridae and Tobaniviridae. Our results expand the scale of the known extraordinary evolutionary plasticity in nidoviral genome architecture and call for revisiting fundamentals of genome expression, virus particle biology, host range and ecology of vertebrate nidoviruses.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11065284PMC
http://dx.doi.org/10.1371/journal.ppat.1012163DOI Listing

Publication Analysis

Top Keywords

sequence read
8
read archive
8
virus discovery
8
genome
6
deep mining
4
mining sequence
4
archive reveals
4
reveals major
4
major genetic
4
genetic innovations
4

Similar Publications

Drug Development.

Alzheimers Dement

December 2024

Novo Nordisk A/S, Søborg, Denmark.

Background: Evidence suggests glucagon-like peptide 1 receptor agonists (GLP-1RAs) may have therapeutic potential in Alzheimer's disease (AD). Cumulative evidence has indicated a potential reduction in cognitive decline in people with AD, while real-world evidence has shown decreased dementia risk in patients with type 2 diabetes. Non-clinical data reveal that GLP-1RAs impact neuroinflammation and other biological processes believed to be involved in AD pathophysiology, including effects on central and peripheral immune cells.

View Article and Find Full Text PDF

Background: The National Institutes of Health Toolbox for Assessment of Neurological and Behavioral Function (NIHTB) was developed to address the need for a brief yet comprehensive instrument to facilitate more uniform assessment in large-scale research studies. Here, we investigated whether the cognitive measures of the NIHTB detect cognitive decline in biomarker-confirmed Alzheimer's disease (AD).

Method: We used data from N = 178 participants (age 76.

View Article and Find Full Text PDF

Carcinogenesis often involves significant alterations in the cancer genome, marked by large structural variants (SVs) and copy number variations (CNVs) that are difficult to capture with short-read sequencing. Traditionally, cytogenetic techniques are applied to detect such aberrations, but they are limited in resolution and do not cover features smaller than several hundred kilobases. Optical genome mapping (OGM) and nanopore sequencing [Oxford Nanopore Technologies (ONT)] bridge this resolution gap and offer enhanced performance for cytogenetic applications.

View Article and Find Full Text PDF

Small proteins (≤100 amino acids) play important roles across all life forms, ranging from unicellular bacteria to higher organisms. In this study, we have developed SProtFP which is a machine learning-based method for functional annotation of prokaryotic small proteins into selected functional categories. SProtFP uses independent artificial neural networks (ANNs) trained using a combination of physicochemical descriptors for classifying small proteins into antitoxin type 2, bacteriocin, DNA-binding, metal-binding, ribosomal protein, RNA-binding, type 1 toxin and type 2 toxin proteins.

View Article and Find Full Text PDF

Evaluating the accuracy of protein-coding sequences in genome annotations is a challenging problem for which there is no broadly applicable solution. In this manuscript, we introduce PSAURON (Protein Sequence Assessment Using a Reference ORF Network), a novel software tool developed to help assess the quality of protein-coding gene annotations. Utilizing a machine learning model trained on a diverse dataset from over 1000 plant and animal genomes, PSAURON assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein-coding region.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!