Relative Suffix Trees.

Andrea Farruggia Travis Gagie Gonzalo Navarro Simon J Puglisi Jouni Sirén

Comput J

Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.

Published: May 2018

Suffix trees are one of the most versatile data structures in stringology, with many applications in bioinformatics. Their main drawback is their size, which can be tens of times larger than the input sequence. Much effort has been put into reducing the space usage, leading ultimately to compressed suffix trees. These compressed data structures can efficiently simulate the suffix tree, while using space proportional to a compressed representation of the sequence. In this work, we take a new approach to compressed suffix trees for repetitive sequence collections, such as collections of individual genomes. We compress the suffix trees of individual sequences relative to the suffix tree of a reference sequence. These relative data structures provide competitive time/space trade-offs, being almost as small as the smallest compressed suffix trees for repetitive collections, and competitive in time with the largest and fastest compressed suffix trees.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5956352	PMC
http://dx.doi.org/10.1093/comjnl/bxx108	DOI Listing

Publication Analysis

Top Keywords

suffix trees

compressed suffix

data structures

relative suffix

suffix

suffix tree

trees repetitive

trees

compressed

trees suffix

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!