Linear-time computation of minimal absent words using suffix array.

BMC Bioinformatics

Department of Informatics, King's College London, The Strand, WC2R 2LS, London, UK.

Published: December 2014

Background: An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation also provides a fast alternative for measuring approximation in sequence comparison. There exists an [Formula: see text]-time and [Formula: see text]-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix automata (Crochemore et al., 1998). No implementation of this algorithm is publicly available. There also exists an [Formula: see text]-time and [Formula: see text]-space algorithm for the same problem based on the construction of suffix arrays (Pinho et al., 2009). An implementation of this algorithm was also provided by the authors and is currently the fastest available.

Results: Our contribution in this article is twofold: first, we bridge this unpleasant gap by presenting an [Formula: see text]-time and [Formula: see text]-space algorithm for computing all minimal absent words based on the construction of suffix arrays; and second, we provide the respective implementation of this algorithm. Experimental results, using real and synthetic data, show that this implementation outperforms the one by Pinho et al. The open-source code of our implementation is freely available at http://github.com/solonas13/maw .

Conclusions: Classical notions for sequence comparison are increasingly being replaced by other similarity measures that refer to the composition of sequences in terms of their constituent patterns. One such measure is the minimal absent words. In this article, we present a new linear-time and linear-space algorithm for the computation of minimal absent words based on the suffix array.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4297395PMC
http://dx.doi.org/10.1186/s12859-014-0388-9DOI Listing

Publication Analysis

Top Keywords

minimal absent
28
[formula text]-time
12
text]-time [formula
12
[formula text]-space
12
text]-space algorithm
12
based construction
12
construction suffix
12
implementation algorithm
12
computation minimal
8
absent
8

Similar Publications

Introduction: The Friedreich Ataxia Rating Scale-Activities of Daily Living (FARS-ADL) is a valid, highly utilized measure for assessing ADL impacts in patients with Friedreich ataxia. We provide evidence of the psychometric validity of the FARS-ADL in two cohorts of patients with spinocerebellar ataxia (SCA).

Methods: Using data from a cohort of real-world subjects with SCA (recruited at Massachusetts General Hospital [MGH]; n = 33) and a phase 3 trial of troriluzole in adults with SCA (NCT03701399 [Study 206]; n = 217), comprising a subset of patients with the SCA3 genotype (n = 89), the psychometric measurement properties and minimal change thresholds of the FARS-ADL were examined.

View Article and Find Full Text PDF

Opening of the cardiac voltage-gated Na+ channel (Nav1.5) is responsible for robust depolarization of the cardiac action potential, while inactivation, which rapidly follows, allows for repolarization. Regulation of both the voltage- and time-dependent kinetics of Nav1.

View Article and Find Full Text PDF

A variety of minimal clinically important difference (MCID) estimates are available to distinguish subgroups with differing outcomes. When a true gold standard is absent, latent class growth curve analysis (LCGC) has been proposed as a suitable alternative for important change. Our purpose was to evaluate the performance of individual and baseline quartile-stratified MCIDs.

View Article and Find Full Text PDF

Background: Isocitrate dehydrogenase (IDH) wild-type (IDH) glioblastomas (GB) are more aggressive and have a poorer prognosis than IDH mutant (IDH) tumors, emphasizing the need for accurate preoperative differentiation. However, a distinct imaging biomarker for differentiation mostly lacking. Intratumoral thrombosis has been reported as a histopathological biomarker for GB.

View Article and Find Full Text PDF

Background: Bladder injury during cesarean delivery (CD) in pregnant women with severe placenta accreta spectrum (PAS) disorders mostly occurs in the dissection of vesico-uterine space. Placental MRI may help to assess the risk of bladder injury preoperatively.

Purpose: To identify the high-risk MRI signs of bladder injury during CD in women with severe PAS.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!