Objectives: Genomic signatures like -mers have become one of the most prominent approaches to describe genomic data. As a result, myriad real-world applications, such as the construction of de Bruijn graphs in genome assembly, have been benefited by recognizing genomic signatures. In other words, an efficient approach of genomic signature profiling is an essential need for tackling high-throughput sequencing reads. However, most of the existing approaches only recognize fixed-size -mers while many research studies have shown the importance of considering variable-length -mers.

Methods: In this paper, we present a novel genomic signature profiling approach, TahcoRoll, by extending the Aho-Corasick algorithm (AC) for the task of profiling variable-length -mers. We first group nucleotides into two clusters and represent each cluster with a bit. The rolling hash technique is further utilized to encode signatures and read patterns for efficient matching.

Results: In extensive experiments, TahcoRoll significantly outperforms the most state-of-the-art -mer counters and has the capability of processing reads across different sequencing platforms on a budget desktop computer.

Conclusions: The single-thread version of TahcoRoll is as efficient as the eight-thread version of the state-of-the-art, JellyFish, while the eight-thread TahcoRoll outperforms the eight-thread JellyFish by at least four times.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9027990PMC
http://dx.doi.org/10.1515/mr-2021-0016DOI Listing

Publication Analysis

Top Keywords

genomic signature
12
signature profiling
12
rolling hash
8
genomic signatures
8
tahcoroll outperforms
8
genomic
6
tahcoroll
5
tahcoroll fast
4
fast genomic
4
profiling
4

Similar Publications

Cardiovascular diseases (CVDs) include atherosclerosis, which is an inflammatory disease of large and medium vessels that leads to atherosclerotic plaque formation. The key factors contributing to the onset and progression of atherosclerosis include the pro-inflammatory cytokines interferon (IFN)α and IFNγ and the pattern recognition receptor (PRR) Toll-like receptor 4 (TLR4). Together, they trigger the activation of IFN regulatory factors (IRFs) and signal transducer and activator of transcription (STAT)s.

View Article and Find Full Text PDF

Birth cohort studies involve repeated surveys of large numbers of individuals from birth and throughout their lives. They collect information useful for a wide range of life course research domains, and biological samples which can be used to derive data from an increasing collection of omic technologies. This rich source of longitudinal data, when combined with genomic data, offers the scientific community valuable insights ranging from population genetics to applications across the social sciences.

View Article and Find Full Text PDF

analysis of lncRNA-miRNA-mRNA signatures related to Sorafenib effectiveness in liver cancer cells.

World J Gastroenterol

January 2025

Department of Oncology Surgery, Cell Therapy and Organ Transplantation, Institute of Biomedicine of Seville, Virgen del Rocio University Hospital, Seville 41013, Spain.

Background: Hepatocellular carcinoma (HCC) is the most common subtype of primary liver cancer with varied incidence and epidemiology worldwide. Sorafenib is still a recommended treatment for a large proportion of patients with advanced HCC. Different patterns of treatment responsiveness have been identified in differentiated hepatoblastoma HepG2 cells and metastatic HCC SNU449 cells.

View Article and Find Full Text PDF

Expanding the genomic diversity of human anelloviruses.

Virus Evol

January 2025

MRC-University of Glasgow Centre for Virus Research, The University of Glasgow, Glasgow G61 1QH, United Kingdom.

Anelloviruses are a group of small, circular, single-stranded DNA viruses that are found ubiquitously across mammalian hosts. Here, we explored a large number of publicly available human microbiome datasets and retrieved a total of 829 anellovirus genomes, substantially expanding the known diversity of these viruses. The majority of new genomes fall within the three major human anellovirus genera: , and , while we also present new genomes of the under-sampled , and genera.

View Article and Find Full Text PDF

The lysosome-related characteristics affects the prognosis and tumor microenvironment of lung adenocarcinoma.

Front Med (Lausanne)

January 2025

Department of Thoracic Surgery, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China.

Background: The lysosome plays a vitally crucial role in tumor development and is a major participant in the cell death process, involving aberrant functional and structural changes. However, there are few studies on lysosome-associated genes (LAGs) in lung adenocarcinoma (LUAD).

Methods: Bulk RNA-seq of LUAD was downloaded from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!