Publications by authors named "Enrico Seiler"

We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size, and search time while achieving a comparable or better accuracy compared to other state-of-the-art tools.

View Article and Find Full Text PDF

Motivation: The ever-growing size of sequencing data is a major bottleneck in bioinformatics as the advances of hardware development cannot keep up with the data growth. Therefore, an enormous amount of data is collected but rarely ever reused, because it is nearly impossible to find meaningful experiments in the stream of raw data.

Results: As a solution, we propose Needle, a fast and space-efficient index which can be built for thousands of experiments in <2 h and can estimate the quantification of a transcript in these experiments in seconds, thereby outperforming its competitors.

View Article and Find Full Text PDF
Article Synopsis
  • Evaluating metagenomic software is crucial for enhancing the interpretation of metagenomes, and the CAMI II challenge focused on this by using complex datasets from numerous genomes and plasmids.
  • The analysis of 5,002 results from 76 software versions showed significant advancements in assembly, especially with long-read data, although challenges remained with related strains and genome recovery.
  • Findings indicated that while taxon profilers improved, they struggled with viruses and Archaea, highlighting the need for better reproducibility in clinical pathogen detection and guiding researchers in method selection based on efficiency and performance metrics.
View Article and Find Full Text PDF

We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative -mers, an extension of the interleaved Bloom filters (IBFs) as a set membership data structure and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory.

View Article and Find Full Text PDF

Motivation: The exponential growth of assembled genome sequences greatly benefits metagenomics studies. However, currently available methods struggle to manage the increasing amount of sequences and their frequent updates. Indexing the current RefSeq can take days and hundreds of GB of memory on large servers.

View Article and Find Full Text PDF

Horizontal gene transfer (HGT) has changed the way we regard evolution. Instead of waiting for the next generation to establish new traits, especially bacteria are able to take a shortcut via HGT that enables them to pass on genes from one individual to another, even across species boundaries. The tool Daisy offers the first HGT detection approach based on read mapping that provides complementary evidence compared to existing methods.

View Article and Find Full Text PDF

Motivation: Mapping-based approaches have become limited in their application to very large sets of references since computing an FM-index for very large databases (e.g. >10 GB) has become a bottleneck.

View Article and Find Full Text PDF