Publications by Nicola Prezza

Publications by authors named "Nicola Prezza"

Page 1 of 1

Space-time Trade-offs for the LCP Array of Wheeler DFAs.

Nicola Cotumaccio Travis Gagie Dominik Köppl Nicola Prezza

Int Symp String Process Inf Retr

September 2023

Recently, Conte et al. generalized the longest-common prefix (LCP) array from strings to Wheeler DFAs, and they showed that it can be used to efficiently determine matching statistics on a Wheeler DFA [DCC 2023]. However, storing the LCP array requires bits, being the number of states, while the compact representation of Wheeler DFAs often requires much less space.

View Article and Find Full Text PDF

Computing matching statistics on Wheeler DFAs.

Alessio Conte Nicola Cotumaccio Travis Gagie Giovanni Manzini Nicola Prezza

Proc Data Compress Conf

March 2023

Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for computing matching statistics which relies on some components of a compressed suffix tree - notably, the longest common prefix (LCP) array.

View Article and Find Full Text PDF

gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections.

Felipe A Louza Guilherme P Telles Simon Gog Nicola Prezza Giovanna Rosone

Algorithms Mol Biol

September 2020

Background: The construction of a suffix array for a collection of strings is a fundamental task in Bioinformatics and in many other applications that process strings. Related data structures, as the Longest Common Prefix array, the Burrows-Wheeler transform, and the document array, are often needed to accompany the suffix array to efficiently solve a wide variety of problems. While several algorithms have been proposed to construct the suffix array for a single string, less emphasis has been put on algorithms to construct suffix arrays for string collections.

View Article and Find Full Text PDF

Variable-order reference-free variant discovery with the Burrows-Wheeler Transform.

Nicola Prezza Nadia Pisanti Marinella Sciortino Giovanna Rosone

BMC Bioinformatics

September 2020

Background: In [Prezza et al., AMB 2019], a new reference-free and alignment-free framework for the detection of SNPs was suggested and tested. The framework, based on the Burrows-Wheeler Transform (BWT), significantly improves sensitivity and precision of previous de Bruijn graphs based tools by overcoming several of their limitations, namely: (i) the need to establish a fixed value, usually small, for the order k, (ii) the loss of important information such as k-mer coverage and adjacency of k-mers within the same read, and (iii) bad performance in repeated regions longer than k bases.

View Article and Find Full Text PDF

SNPs detection by eBWT positional clustering.

Nicola Prezza Nadia Pisanti Marinella Sciortino Giovanna Rosone

Algorithms Mol Biol

February 2019

Background: Sequencing technologies keep on turning cheaper and faster, thus putting a growing pressure for data structures designed to efficiently store raw data, and possibly perform analysis therein. In this view, there is a growing interest in alignment-free and reference-free variants calling methods that only make use of (suitably indexed) raw reads data.

Results: We develop the theory that (i) describes how the extended Burrows-Wheeler Transform (eBWT) of a collection of reads tends to cluster together bases that cover the same genome position (ii) predicts the size of such clusters, and (iii) exhibits an elegant and precise LCP array based procedure to locate such clusters in the eBWT.

View Article and Find Full Text PDF

Transcriptomics and methylomics of CD4-positive T cells in arsenic-exposed women.

Karin Engström Tomasz K Wojdacz Francesco Marabita Philip Ewels Max Käller Nicola Prezza

Arch Toxicol

May 2017

Arsenic, a carcinogen with immunotoxic effects, is a common contaminant of drinking water and certain food worldwide. We hypothesized that chronic arsenic exposure alters gene expression, potentially by altering DNA methylation of genes encoding central components of the immune system. We therefore analyzed the transcriptomes (by RNA sequencing) and methylomes (by target-enrichment next-generation sequencing) of primary CD4-positive T cells from matched groups of four women each in the Argentinean Andes, with fivefold differences in urinary arsenic concentrations (median concentrations of urinary arsenic in the lower- and high-arsenic groups: 65 and 276 μg/l, respectively).

View Article and Find Full Text PDF

Fast, accurate, and lightweight analysis of BS-treated reads with ERNE 2.

Nicola Prezza Francesco Vezzi Max Käller Alberto Policriti

BMC Bioinformatics

March 2016

Background: Bisulfite treatment of DNA followed by sequencing (BS-seq) has become a standard technique in epigenetic studies, providing researchers with tools for generating single-base resolution maps of whole methylomes. Aligning bisulfite-treated reads, however, is a computationally difficult task: bisulfite treatment decreases the (lexical) complexity of low-methylated genomic regions, and C-to-T mismatches may reflect cytosine unmethylation rather than SNPs or sequencing errors. Further challenges arise both during and after the alignment phase: data structures used by the aligner should be fast and should fit into main memory, and the methylation-caller output should be somehow compressed, due to its significant size.

View Article and Find Full Text PDF

Fast randomized approximate string matching with succinct hash data structures.

Alberto Policriti Nicola Prezza

BMC Bioinformatics

September 2015

Background: The high throughput of modern NGS sequencers coupled with the huge sizes of genomes currently analysed, poses always higher algorithmic challenges to align short reads quickly and accurately against a reference sequence. A crucial, additional, requirement is that the data structures used should be light. The available modern solutions usually are a compromise between the mentioned constraints: in particular, indexes based on the Burrows-Wheeler transform offer reduced memory requirements at the price of lower sensitivity, while hash-based text indexes guarantee high sensitivity at the price of significant memory consumption.

View Article and Find Full Text PDF