Publications by Fostier J | LitMetric

Publications by authors named "Fostier J"

Page 1 of 2

b-move: Faster Lossless Approximate Pattern Matching in a Run-Length Compressed Index.

Lore Depuydt Luca Renders Simon Van de Vyver Lennart Veys Travis Gagie

Res Sq

November 2024

Background: Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.

View Article and Find Full Text PDF

Lossless Approximate Pattern Matching: Automated Design of Efficient Search Schemes.

Luca Renders Lore Depuydt Sven Rahmann Jan Fostier

J Comput Biol

October 2024

This study introduces a pioneering approach to automate the creation of search schemes for lossless approximate pattern matching. Search schemes are combinatorial structures that define a series of searches over a partitioned pattern. Each search specifies the processing order of these parts and the cumulative lower and upper bounds on the number of errors in each part of the pattern.

View Article and Find Full Text PDF

Faster Maximal Exact Matches with Lazy LCP Evaluation.

Adrián Goga Lore Depuydt Nathaniel K Brown Jan Fostier Travis Gagie

Proc Data Compress Conf

March 2024

MONI (Rossi et al., 2022) is a BWT-based compressed index for computing the matching statistics and maximal exact matches (MEMs) of a pattern (usually a DNA read) with respect to a highly repetitive text (usually a database of genomes) using two operations: LF-steps and longest common extension (LCE) queries on a grammar-compressed representation of the text. In practice, most of the operations are constant-time LF-steps but most of the time is spent evaluating LCE queries.

View Article and Find Full Text PDF

b-move: faster bidirectional character extensions in a run-length compressed index.

Lore Depuydt Luca Renders Simon Van de Vyver Lennart Veys Travis Gagie

bioRxiv

June 2024

Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.

View Article and Find Full Text PDF

Pan-genome de Bruijn graph using the bidirectional FM-index.

Lore Depuydt Luca Renders Thomas Abeel Jan Fostier

BMC Bioinformatics

October 2023

Background: Pan-genome graphs are gaining importance in the field of bioinformatics as data structures to represent and jointly analyze multiple genomes. Compacted de Bruijn graphs are inherently suited for this purpose, as their graph topology naturally reveals similarity and divergence within the pan-genome. Most state-of-the-art pan-genome graphs are represented explicitly in terms of nodes and edges.

View Article and Find Full Text PDF

Oracle selection provides insight into how far off practice is from Utopia in plant breeding.

David Vanavermaete Steven Maenhout Jan Fostier Bernard De Baets

Front Plant Sci

July 2023

Since the introduction of genomic selection in plant breeding, high genetic gains have been realized in different plant breeding programs. Various methods based on genomic estimated breeding values (GEBVs) for selecting parental lines that maximize the genetic gain as well as methods for improving the predictive performance of genomic selection have been proposed. Unfortunately, it remains difficult to measure to what extent these methods really maximize long-term genetic values.

View Article and Find Full Text PDF

Improved Node and Arc Multiplicity Estimation in De Bruijn Graphs Using Approximate Inference in Conditional Random Fields.

Aranka Steyaert Pieter Audenaert Jan Fostier

IEEE/ACM Trans Comput Biol Bioinform

June 2023

In de novo genome assembly using short Illumina reads, the accurate determination of node and arc multiplicities in a de Bruijn graph has a large impact on the quality and contiguity of the assembly. The multiplicity estimates of nodes and arcs guide the cleaning of the de Bruijn graph by identifying spurious nodes and arcs that correspond to sequencing errors. Additionally, they can be used to guide repeat resolution.

View Article and Find Full Text PDF

BLSSpeller to discover novel regulatory motifs in maize.

Razgar Seyed Rahmani Dries Decap Jan Fostier Kathleen Marchal

DNA Res

June 2022

With the decreasing cost of sequencing and availability of larger numbers of sequenced genomes, comparative genomics is becoming increasingly attractive to complement experimental techniques for the task of transcription factor (TF) binding site identification. In this study, we redesigned BLSSpeller, a motif discovery algorithm, to cope with larger sequence datasets. BLSSpeller was used to identify novel motifs in Zea mays in a comparative genomics setting with 16 monocot lineages.

View Article and Find Full Text PDF

Halvade somatic: Somatic variant calling with Apache Spark.

Dries Decap Louise de Schaetzen van Brienen Maarten Larmuseau Pascal Costanza Charlotte Herzeel

Gigascience

January 2022

Background: The accurate detection of somatic variants from sequencing data is of key importance for cancer treatment and research. Somatic variant calling requires a high sequencing depth of the tumor sample, especially when the detection of low-frequency variants is also desired. In turn, this leads to large volumes of raw sequencing data to process and hence, large computational requirements.

View Article and Find Full Text PDF

Deep scoping: a breeding strategy to preserve, reintroduce and exploit genetic variation.

David Vanavermaete Jan Fostier Steven Maenhout Bernard De Baets

Theor Appl Genet

December 2021

The deep scoping method incorporates the use of a gene bank together with different population layers to reintroduce genetic variation into the breeding population, thus maximizing the long-term genetic gain without reducing the short-term genetic gain or increasing the total financial cost. Genomic prediction is often combined with truncation selection to identify superior parental individuals that can pass on favorable quantitative trait locus (QTL) alleles to their offspring. However, truncation selection reduces genetic variation within the breeding population, causing a premature convergence to a sub-optimal genetic value.

View Article and Find Full Text PDF

Dynamic partitioning of search patterns for approximate pattern matching using search schemes.

Luca Renders Kathleen Marchal Jan Fostier

iScience

July 2021

Search schemes constitute a flexible and generic framework to describe how all approximate occurrences of a search pattern in a text can be found efficiently. We propose an algorithm for the dynamic partitioning of search patterns which can be universally applied to any kind of search scheme and demonstrate that this technique significantly reduces the search space. We present Columba, a software tool written in C++, in which a multitude of search schemes are implemented.

View Article and Find Full Text PDF

Multithreaded variant calling in elPrep 5.

Charlotte Herzeel Pascal Costanza Dries Decap Jan Fostier Roel Wuyts

PLoS One

July 2021

We present elPrep 5, which updates the elPrep framework for processing sequencing alignment/map files with variant calling. elPrep 5 can now execute the full pipeline described by the GATK Best Practices for variant calling, which consists of PCR and optical duplicate marking, sorting by coordinate order, base quality score recalibration, and variant calling using the haplotype caller algorithm. elPrep 5 produces identical BAM and VCF output as GATK4 while significantly reducing the runtime by parallelizing and merging the execution of the pipeline steps.

View Article and Find Full Text PDF

Accurate determination of node and arc multiplicities in de bruijn graphs using conditional random fields.

Aranka Steyaert Pieter Audenaert Jan Fostier

BMC Bioinformatics

September 2020

Background: De Bruijn graphs are key data structures for the analysis of next-generation sequencing data. They efficiently represent the overlap between reads and hence, also the underlying genome sequence. However, sequencing errors and repeated subsequences render the identification of the true underlying sequence difficult.

View Article and Find Full Text PDF

Computational assessment of the feasibility of protonation-based protein sequencing.

Giles Miclotte Koen Martens Jan Fostier

PLoS One

October 2020

Recent advances in DNA sequencing methods revolutionized biology by providing highly accurate reads, with high throughput or high read length. These read data are being used in many biological and medical applications. Modern DNA sequencing methods have no equivalent in protein sequencing, severely limiting the widespread application of protein data.

View Article and Find Full Text PDF

Comparative analysis of somatic variant calling on matched FF and FFPE WGS samples.

Louise de Schaetzen van Brienen Maarten Larmuseau Kim Van der Eecken Frederic De Ryck Pauline Robbe

BMC Med Genomics

July 2020

Background: Research grade Fresh Frozen (FF) DNA material is not yet routinely collected in clinical practice. Many hospitals, however, collect and store Formalin Fixed Paraffin Embedded (FFPE) tumor samples. Consequently, the sample size of whole genome cancer cohort studies could be increased tremendously by including FFPE samples, although the presence of artefacts might obfuscate the variant calling.

View Article and Find Full Text PDF

Preservation of Genetic Variation in a Breeding Population for Long-Term Genetic Gain.

David Vanavermaete Jan Fostier Steven Maenhout Bernard De Baets

G3 (Bethesda)

August 2020

Genomic selection has been successfully implemented in plant and animal breeding. The transition of parental selection based on phenotypic characteristics to genomic selection (GS) has reduced breeding time and cost while accelerating the rate of genetic progression. Although breeding methods have been adapted to include genomic selection, parental selection often involves selection, selecting the individuals with the highest genomic estimated breeding values (GEBVs) in the hope that favorable properties will be passed to their offspring.

View Article and Find Full Text PDF

BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs.

BMC Bioinformatics

March 2020

Background: The identification of all matches of a large set of position weight matrices (PWMs) in long DNA sequences requires significant computational resources for which a number of efficient yet complex algorithms have been proposed.

Results: We propose BLAMM, a simple and efficient tool inspired by high performance computing techniques. The workload is expressed in terms of matrix-matrix products that are evaluated with high efficiency using optimized BLAS library implementations.

View Article and Find Full Text PDF

GABAC: an arithmetic coding solution for genomic data.

Jan Voges Tom Paridaens Fabian Müntefering Liudmila S Mainzer Brian Bliss

Bioinformatics

April 2020

Motivation: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data.

View Article and Find Full Text PDF

Illumina error correction near highly repetitive DNA regions improves de novo genome assembly.

Mahdi Heydari Giles Miclotte Yves Van de Peer Jan Fostier

BMC Bioinformatics

June 2019

Background: Several standalone error correction tools have been proposed to correct sequencing errors in Illumina data in order to facilitate de novo genome assembly. However, in a recent survey, we showed that state-of-the-art assemblers often did not benefit from this pre-correction step. We found that many error correction tools introduce new errors in reads that overlap highly repetitive DNA regions such as low-complexity patterns or short homopolymers, ultimately leading to a more fragmented assembly.

View Article and Find Full Text PDF

elPrep 4: A multithreaded framework for sequence analysis.

Charlotte Herzeel Pascal Costanza Dries Decap Jan Fostier Wilfried Verachtert

PLoS One

November 2019

We present elPrep 4, a reimplementation from scratch of the elPrep framework for processing sequence alignment map files in the Go programming language. elPrep 4 includes multiple new features allowing us to process all of the preparation steps defined by the GATK Best Practice pipelines for variant calling. This includes new and improved functionality for sorting, (optical) duplicate marking, base quality score recalibration, BED and VCF parsing, and various filtering options.

View Article and Find Full Text PDF

Dynamical anchoring of distant arrhythmia sources by fibrotic regions via restructuring of the activation pattern.

Nele Vandersickel Masaya Watanabe Qian Tao Jan Fostier Katja Zeppenfeld

PLoS Comput Biol

December 2018

Rotors are functional reentry sources identified in clinically relevant cardiac arrhythmias, such as ventricular and atrial fibrillation. Ablation targeting rotor sites has resulted in arrhythmia termination. Recent clinical, experimental and modelling studies demonstrate that rotors are often anchored around fibrotic scars or regions with increased fibrosis.

View Article and Find Full Text PDF

BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs.

Mahdi Heydari Giles Miclotte Yves Van de Peer Jan Fostier

BMC Bioinformatics

September 2018

Background: Aligning short reads to a reference genome is an important task in many genome analysis pipelines. This task is computationally more complex when the reference genome is provided in the form of a de Bruijn graph instead of a linear sequence string.

Results: We present a branch and bound alignment algorithm that uses the seed-and-extend paradigm to accurately align short Illumina reads to a graph.

View Article and Find Full Text PDF

Evaluation of the impact of Illumina error correction tools on de novo genome assembly.

Mahdi Heydari Giles Miclotte Piet Demeester Yves Van de Peer Jan Fostier

BMC Bioinformatics

August 2017

Background: Recently, many standalone applications have been proposed to correct sequencing errors in Illumina data. The key idea is that downstream analysis tools such as de novo genome assemblers benefit from a reduced error rate in the input data. Surprisingly, a systematic validation of this assumption using state-of-the-art assembly methods is lacking, even for recently published methods.

View Article and Find Full Text PDF

OMSim: a simulator for optical map data.

Giles Miclotte Stéphane Plaisance Stephane Rombauts Yves Van de Peer Pieter Audenaert

Bioinformatics

September 2017

Motivation: The Bionano Genomics platform allows for the optical detection of short sequence patterns in very long DNA molecules (up to 2.5 Mbp). Molecules with overlapping patterns can be assembled to generate a consensus optical map of the entire genome.

View Article and Find Full Text PDF

Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.

Dries Decap Joke Reumers Charlotte Herzeel Pascal Costanza Jan Fostier

PLoS One

August 2017

Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets.

View Article and Find Full Text PDF