Computational Performance Assessment of k-mer Counting Algorithms.

J Comput Biol

GICOGE, Universidad Distrital Francisco José de Caldas , Bogotá, Colombia .

Published: April 2016

This article is about the assessment of several tools for k-mer counting, with the purpose to create a reference framework for bioinformatics researchers to identify computational requirements, parallelizing, advantages, disadvantages, and bottlenecks of each of the algorithms proposed in the tools. The k-mer counters evaluated in this article were BFCounter, DSK, Jellyfish, KAnalyze, KHMer, KMC2, MSPKmerCounter, Tallymer, and Turtle. Measured parameters were the following: RAM occupied space, processing time, parallelization, and read and write disk access. A dataset consisting of 36,504,800 reads was used corresponding to the 14th human chromosome. The assessment was performed for two k-mer lengths: 31 and 55. Obtained results were the following: pure Bloom filter-based tools and disk-partitioning techniques showed a lesser RAM use. The tools that took less execution time were the ones that used disk-partitioning techniques. The techniques that made the major parallelization were the ones that used disk partitioning, hash tables with lock-free approach, or multiple hash tables.

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2015.0199DOI Listing

Publication Analysis

Top Keywords

k-mer counting
8
tools k-mer
8
disk-partitioning techniques
8
hash tables
8
computational performance
4
performance assessment
4
k-mer
4
assessment k-mer
4
counting algorithms
4
algorithms article
4

Similar Publications

Plant cells have two major organelles with their own genomes: chloroplasts and mitochondria. While chloroplast genomes tend to be structurally conserved, the mitochondrial genomes of plants, which are much larger than those of animals, are characterized by complex structural variation. We introduce TIPPo, a user-friendly, reference-free assembly tool that uses PacBio high-fidelity long-read data and that does not rely on genomes from related species or nuclear genome information for the assembly of organellar genomes.

View Article and Find Full Text PDF

Typical high-throughput single-cell RNA-sequencing (scRNA-seq) analyses are primarily conducted by (pseudo)alignment, through the lens of annotated gene models, and aimed at detecting differential gene expression. This misses diversity generated by other mechanisms that diversify the transcriptome such as splicing and V(D)J recombination, and is blind to sequences missing from imperfect reference genomes. Here, we present sc-SPLASH, a highly efficient pipeline that extends our SPLASH framework for statistics-first, reference-free discovery to barcoded scRNA-seq (10x Chromium) and spatial transcriptomics (10x Visium); we also provide its optimized module for preprocessing and -mer counting in barcoded data, BKC, as a standalone tool.

View Article and Find Full Text PDF

kMetaShot: a fast and reliable taxonomy classifier for metagenome-assembled genomes.

Brief Bioinform

November 2024

Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Via E. Orabona 4, 70126, Bari, Italy.

The advent of high-throughput sequencing (HTS) technologies unlocked the complexity of the microbial world through the development of metagenomics, which now provides an unprecedented and comprehensive overview of its taxonomic and functional contribution in a huge variety of macro- and micro-ecosystems. In particular, shotgun metagenomics allows the reconstruction of microbial genomes, through the assembly of reads into MAGs (metagenome-assembled genomes). In fact, MAGs represent an information-rich proxy for inferring the taxonomic composition and the functional contribution of microbiomes, even if the relevant analytical approaches are not trivial and still improvable.

View Article and Find Full Text PDF

[Ploidy and whole genome Survey of Pinellia ternata].

Zhongguo Zhong Yao Za Zhi

October 2024

Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences Beijing 100700, China.

Pinellia ternata, a widely distributed species in China, has been used as a herbal medicine for centuries, with the effects of drying dampness and resolving phlegm. However, its complex ploidy and lack of whole-genome map limit in-depth research on molecular-assisted breeding and multi-omics. In this study, flow cytometry was employed to evaluate the genome sizes of 144 P.

View Article and Find Full Text PDF
Article Synopsis
  • The rise of large biological datasets has made DNA and protein sequence alignments crucial for studying evolutionary relationships and sequence conservation.
  • A new tool called MAFcounter has been created specifically for counting k-mer occurrences in alignment files, making it the first of its kind.
  • MAFcounter is designed to be fast, multithreaded, and memory efficient, and is available for download on GitHub under a GPL license.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!