The massive amount of genomic data appearing for SARS-CoV-2 since the beginning of the COVID-19 pandemic has challenged traditional methods for studying its dynamics. As a result, new methods such as Pangolin, which can scale to the millions of samples of SARS-CoV-2 currently available, have appeared. Such a tool is tailored to take as input assembled, aligned, and curated full-length sequences, such as those found in the GISAID database. As high-throughput sequencing technologies continue to advance, such assembly, alignment, and curation may become a bottleneck, creating a need for methods that can process raw sequencing reads directly. In this article, we propose Reads2Vec, an alignment-free embedding approach that can generate a fixed-length feature vector representation directly from the raw sequencing reads without requiring assembly. Furthermore, since such an embedding is a numerical representation, it may be applied to highly optimized classification and clustering algorithms. Experiments on simulated data show that our proposed embedding obtains better classification results and better clustering properties contrary to existing alignment-free baselines. In a study on real data, we show that alignment-free embeddings have better clustering properties than the Pangolin tool and that the spike region of the SARS-CoV-2 genome heavily informs the alignment-free clusterings, which is consistent with current biological knowledge of SARS-CoV-2.

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2022.0424DOI Listing

Publication Analysis

Top Keywords

sequencing reads
12
high-throughput sequencing
8
raw sequencing
8
better clustering
8
clustering properties
8
reads2vec efficient
4
embedding
4
efficient embedding
4
embedding raw
4
raw high-throughput
4

Similar Publications

Here, we report the complete genome sequence of a new carlavirus causing mosaic on mint plants in Italy, which we have tentatively named "mint virus C" (MVC). Flexuous particles of around 600 nm were observed using transmission electron microscopy, and next-generation sequencing was performed to determine the nucleotide sequence of the MVC genome, which was found to be 8558 nt long, excluding the poly(A) tail, and shows the typical organization of a carlavirus. The putative proteins encoded by MVC are 44-56% identical to the closest matches in the NCBI database, suggesting that MVC should be considered a member of a new species in the genus Carlavirus.

View Article and Find Full Text PDF

Staphylococcus epidermidis (S. epidermidis) live in different human locations and natural environments. For ribotyping S.

View Article and Find Full Text PDF

Foot-and-mouth disease virus (FMDV), the causative agent of the foot-and-mouth disease of cattle population possesses a rapid evolutionary rate. In Bangladesh, the first circulation of the O/ME-SA/SA-2018 lineage as a novel sublineage, MYMBD21 was reported from our laboratory. The first whole genome sequence of an isolate, BAN/MY/My-466/2021 (shortly named My-466) of the SA-2018 lineage is characterized and represented in this study.

View Article and Find Full Text PDF

We describe the phenotypic and genotypic spectrum of patients with vascular anomaly (VA) in a paediatric multi-disciplinary VA clinic. We measured the clinical utility of genotyping by comparing pre and posttest diagnosis and management. A 46-month retrospective analysis occurred for 250 patients offered genetic testing in the VA clinic.

View Article and Find Full Text PDF

Human tumors are diverse in their natural history and response to treatment, which in part results from genetic and transcriptomic heterogeneity. In clinical practice, single-site needle biopsies are used to sample this diversity, but cancer biomarkers may be confounded by spatiogenomic heterogeneity within individual tumors. Here we investigate clonally expressed genes as a solution to the sampling bias problem by analyzing multiregion whole-exome and RNA sequencing data for 450 tumor regions from 184 patients with lung adenocarcinoma in the TRACERx study.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!