Succinct dynamic de Bruijn graphs.

Bioinformatics

Department of Computer and Information Science and Engineering, College of Engineering, University of Florida, Gainesville, FL 32611, USA.

Published: August 2021

Motivation: The de Bruijn graph is one of the fundamental data structures for analysis of high throughput sequencing data. In order to be applicable to population-scale studies, it is essential to build and store the graph in a space- and time-efficient manner. In addition, due to the ever-changing nature of population studies, it has become essential to update the graph after construction, e.g. add and remove nodes and edges. Although there has been substantial effort on making the construction and storage of the graph efficient, there is a limited amount of work in building the graph in an efficient and mutable manner. Hence, most space efficient data structures require complete reconstruction of the graph in order to add or remove edges or nodes.

Results: In this article, we present DynamicBOSS, a succinct representation of the de Bruijn graph that allows for an unlimited number of additions and deletions of nodes and edges. We compare our method with other competing methods and demonstrate that DynamicBOSS is the only method that supports both addition and deletion and is applicable to very large samples (e.g. greater than 15 billion k-mers). Competing dynamic methods, e.g. FDBG cannot be constructed on large scale datasets, or cannot support both addition and deletion, e.g. BiFrost.

Availability And Implementation: DynamicBOSS is publicly available at https://github.com/baharpan/dynboss.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8337006PMC
http://dx.doi.org/10.1093/bioinformatics/btaa546DOI Listing

Publication Analysis

Top Keywords

bruijn graph
8
data structures
8
studies essential
8
add remove
8
nodes edges
8
graph efficient
8
addition deletion
8
graph
7
succinct dynamic
4
dynamic bruijn
4

Similar Publications

Anchorage Accurately Assembles Anchor-Flanked Synthetic Long Reads.

Lebniz Int Proc Inform

August 2024

Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.

Modern sequencing technologies allow for the addition of short-sequence tags, known as anchors, to both ends of a captured molecule. Anchors are useful in assembling the full-length sequence of a captured molecule as they can be used to accurately determine the endpoints. One representative of such anchor-enabled technology is LoopSeq Solo, a synthetic long read (SLR) sequencing protocol.

View Article and Find Full Text PDF

Pediatric diabetes I is an endemic and an especially difficult disease; indeed, at this point, there does not exist a cure, but only careful management that relies on anticipating hypoglycemia. The changing physiology of children producing unique blood glucose signatures, coupled with inconsistent activities, e.g.

View Article and Find Full Text PDF

Error self-correction is crucial for analyzing long-read sequencing data, but existing methods often struggle with noisy data or are tailored to technologies like PacBio HiFi. There is a gap in methods optimized for Nanopore R10 simplex reads, which typically have error rates below 2%. We introduce DeChat, a novel approach designed specifically for these reads.

View Article and Find Full Text PDF

Evaluating Sequence Alignment Tools for Antimicrobial Resistance Gene Detection in Assembly Graphs.

Microorganisms

October 2024

Department of Mathematics and Computing Science, Saint Mary's University, Halifax, NS B3H 3C3, Canada.

Antimicrobial resistance (AMR) is an escalating global health threat, often driven by the horizontal gene transfer (HGT) of resistance genes. Detecting AMR genes and understanding their genomic context within bacterial populations is crucial for mitigating the spread of resistance. In this study, we evaluate the performance of three sequence alignment tools-Bandage, SPAligner, and GraphAligner-in identifying AMR gene sequences from assembly and de Bruijn graphs, which are commonly used in microbial genome assembly.

View Article and Find Full Text PDF

Noncoding RNA refers to RNA that does not encode proteins. The lncRNA and miRNA it contains play crucial regulatory roles in organisms, and their aberrant expression is closely related to various diseases. Traditional experimental methods for validating the interactions of these RNAs have limitations, and existing prediction models exhibit relatively limited functionality, relying on isolated feature extraction and performing poorly in handling various types of small sample tasks.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!