Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index.

Cell Syst

Computer Science Department, Stony Brook University, 100 Nicolls Rd, Stony Brook, NY 11794, USA. Electronic address:

Published: August 2018

Sequence-level searches on large collections of RNA sequencing experiments, such as the NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the expression or variation of a given transcript in a population. Existing approaches, such as the sequence Bloom tree, suffer from fundamental limitations of the Bloom filter, resulting in slow build and query times, less-than-optimal space usage, and potentially large numbers of false-positives. This paper introduces Mantis, a space-efficient system that uses new data structures to index thousands of raw-read experiments and facilitates large-scale sequence searches. In our evaluation, index construction with Mantis is 6× faster and yields a 20% smaller index than the state-of-the-art split sequence Bloom tree (SSBT). For queries, Mantis is 6-108× faster than SSBT and has no false-positives or -negatives. For example, Mantis was able to search for all 200,400 known human transcripts in an index of 2,652 RNA sequencing experiments in 82 min; SSBT took close to 4 days.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10964368PMC
http://dx.doi.org/10.1016/j.cels.2018.05.021DOI Listing

Publication Analysis

Top Keywords

sequence bloom
8
bloom tree
8
mantis
5
mantis fast
4
fast small
4
small exact
4
exact large-scale
4
large-scale sequence-search
4
sequence-search sequence-level
4
sequence-level searches
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!