The massive amount of genomic data appearing for SARS-CoV-2 since the beginning of the COVID-19 pandemic has challenged traditional methods for studying its dynamics. As a result, new methods such as Pangolin, which can scale to the millions of samples of SARS-CoV-2 currently available, have appeared. Such a tool is tailored to take as input assembled, aligned, and curated full-length sequences, such as those found in the GISAID database. As high-throughput sequencing technologies continue to advance, such assembly, alignment, and curation may become a bottleneck, creating a need for methods that can process raw sequencing reads directly. In this article, we propose Reads2Vec, an alignment-free embedding approach that can generate a fixed-length feature vector representation directly from the raw sequencing reads without requiring assembly. Furthermore, since such an embedding is a numerical representation, it may be applied to highly optimized classification and clustering algorithms. Experiments on simulated data show that our proposed embedding obtains better classification results and better clustering properties contrary to existing alignment-free baselines. In a study on real data, we show that alignment-free embeddings have better clustering properties than the Pangolin tool and that the spike region of the SARS-CoV-2 genome heavily informs the alignment-free clusterings, which is consistent with current biological knowledge of SARS-CoV-2.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1089/cmb.2022.0424 | DOI Listing |
Arch Virol
January 2025
Institute for Sustainable Plant Protection, CNR, Strada delle Cacce 73, 10135, Torino, Italy.
Here, we report the complete genome sequence of a new carlavirus causing mosaic on mint plants in Italy, which we have tentatively named "mint virus C" (MVC). Flexuous particles of around 600 nm were observed using transmission electron microscopy, and next-generation sequencing was performed to determine the nucleotide sequence of the MVC genome, which was found to be 8558 nt long, excluding the poly(A) tail, and shows the typical organization of a carlavirus. The putative proteins encoded by MVC are 44-56% identical to the closest matches in the NCBI database, suggesting that MVC should be considered a member of a new species in the genus Carlavirus.
View Article and Find Full Text PDFCurr Microbiol
January 2025
Coastar Therapeutics, San Diego, CA, 92126, USA.
Staphylococcus epidermidis (S. epidermidis) live in different human locations and natural environments. For ribotyping S.
View Article and Find Full Text PDFHeliyon
March 2024
Department of Microbiology, University of Dhaka, Dhaka, 1000, Bangladesh.
Foot-and-mouth disease virus (FMDV), the causative agent of the foot-and-mouth disease of cattle population possesses a rapid evolutionary rate. In Bangladesh, the first circulation of the O/ME-SA/SA-2018 lineage as a novel sublineage, MYMBD21 was reported from our laboratory. The first whole genome sequence of an isolate, BAN/MY/My-466/2021 (shortly named My-466) of the SA-2018 lineage is characterized and represented in this study.
View Article and Find Full Text PDFAm J Med Genet A
January 2025
Genetic Health Queensland, Royal Brisbane and Women's Hospital, Herston, Australia.
We describe the phenotypic and genotypic spectrum of patients with vascular anomaly (VA) in a paediatric multi-disciplinary VA clinic. We measured the clinical utility of genotyping by comparing pre and posttest diagnosis and management. A 46-month retrospective analysis occurred for 250 patients offered genetic testing in the VA clinic.
View Article and Find Full Text PDFNat Cancer
January 2025
Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK.
Human tumors are diverse in their natural history and response to treatment, which in part results from genetic and transcriptomic heterogeneity. In clinical practice, single-site needle biopsies are used to sample this diversity, but cancer biomarkers may be confounded by spatiogenomic heterogeneity within individual tumors. Here we investigate clonally expressed genes as a solution to the sampling bias problem by analyzing multiregion whole-exome and RNA sequencing data for 450 tumor regions from 184 patients with lung adenocarcinoma in the TRACERx study.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!