Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.

J Comput Biol

3 Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania.

Published: June 2017

Using a sequence's k-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As k-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for k-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of k-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR). We show that, because k-mers are derived from sequencing reads, the information about k-mer overlap in the original sequence can be used to reduce the FPR up to 30 × with little or no additional memory and with set containment queries that are only 1.3 - 1.6 times slower. Alternatively, we can leverage k-mer overlap information to store k-mer sets in about half the space while maintaining the original FPR. We consider several variants of such k-mer Bloom filters (kBFs), derive theoretical upper bounds for their FPR, and discuss their range of applications and limitations.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467106PMC
http://dx.doi.org/10.1089/cmb.2016.0155DOI Listing

Publication Analysis

Top Keywords

bloom filters
12
k-mer
8
data k-mer
8
k-mer bloom
8
k-mer sets
8
set containment
8
containment queries
8
k-mer overlap
8
improving bloom
4
bloom filter
4

Similar Publications

Algal organic matter alters protistan community structure and assembly processes in coastal sediments.

Eur J Protistol

January 2025

Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, PR China. Electronic address:

Diatom blooms are a global ecological perturbation that releases algal organic matter (AOM), significantly affecting coastal ecosystems by altering microbial community dynamics. AOM, derived from algal cell lysis, may serve as a nutrient source influencing protistan communities. However, the effects of AOM on protistan ecology, including the community structure and assembly processes, remain largely unexplored in coastal sediments.

View Article and Find Full Text PDF

The cyanobacterium causes harmful algal blooms that pose a major threat to human health and ecosystem services, particularly due to the prevalence of the potent hepatotoxin microcystin (MC). With their pronounced EPS layer, colonies also serve as a hub for heterotrophic phycosphere bacteria. Here, we tested the hypothesis that the genotypic plasticity in its ability to produce MC influences the composition and assembly of the phycosphere microbiome.

View Article and Find Full Text PDF

The release of algal organic matter (AOM) during seasonal algal blooms increases the complexity and heterogeneity of natural organic matter (NOM) in water sources, altering its hydrophilic-hydrophobic balance and posing significant challenges to conventional water treatment processes. This study aims to verify whether the (Granular activated carbon) GAC selected for the adsorption of NOM in sand filtration effluent can adapt to water quality fluctuations caused by AOM release, and identify the criteria influencing GAC adsorption performance. Results indicated that external surface area, mesopore volume, pore size and surface functional groups were key indicators of GAC adsorption performance.

View Article and Find Full Text PDF

Promoted growth with dynamic cellular stoichiometry driven by utilization of in-situ dissolved organic matter: Insights from bloom-forming dinoflagellate Prorocentrum donghaiense.

Mar Environ Res

December 2024

State Key Laboratory of Marine Environmental Science, Xiamen University, Xiamen, China; College of Ocean and Earth Sciences, Xiamen University, Xiamen, China. Electronic address:

Mixotrophic dinoflagellates frequently cause harmful algal blooms (HABs) in eutrophic waters that contain diverse dissolved organic matter (DOM), especially intensive mariculture areas. Compared to the extensive investigation of phagotrophy and single organic molecule uptake by causative species, we have limited knowledge about the capability of mixotrophic dinoflagellates to utilize in-situ DOM in mariculture waters and its contribution to HABs. Here we use filtered in-situ mariculture water as the sole medium to examine the physiological response of Prorocentrum donghaiense to the natural mariculture DOM.

View Article and Find Full Text PDF

Marine microorganisms play a critical role in regulating atmospheric CO concentration via the biological carbon pump. Deposition of continental mineral dust on the sea surface increases carbon sequestration but the interaction between minerals and marine microorganisms is not well understood. We discovered that the interaction of clay minerals with dissolved organic matter and a γ-proteobacterium in seawater increases Transparent Exopolymer Particle (TEP) concentration, leading to organoclay floc formation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!