Publications by authors named "David Pellow"

Plasmids are pivotal in driving bacterial evolution through horizontal gene transfer. Here, we investigated 3467 human gut microbiome samples across continents and disease states, analyzing 11,086 plasmids. Our analyses reveal that plasmid dispersal is predominantly stochastic, indicating neutral processes as the primary driver of their wide distribution.

View Article and Find Full Text PDF

Minimizers are ubiquitously used in data structures and algorithms for efficient searching, mapping, and indexing of high-throughput DNA sequencing data. Minimizer schemes select a minimum -mer in every -long subsequence of the target sequence, where minimality is with respect to a predefined -mer order. Commonly used minimizer orders select more -mers than necessary and therefore provide limited improvement in runtime and memory usage of downstream analysis tasks.

View Article and Find Full Text PDF

Motivation: Sequencing long reads presents novel challenges to mapping. One such challenge is low sequence similarity between the reads and the reference, due to high sequencing error and mutation rates. This occurs, e.

View Article and Find Full Text PDF

The rapid continuous growth of deep sequencing experiments requires development and improvement of many bioinformatic applications for analysis of large sequencing data sets, including -mer counting and assembly. Several applications reduce memory usage by binning sequences. Binning is done by using minimizer schemes, which rely on a specific order of the minimizers.

View Article and Find Full Text PDF
Article Synopsis
  • Metagenomic sequencing is revealing many new bacterial genomes and their associated plasmids, which are small DNA molecules that can spread antibiotic resistance but are less understood than bacteria.
  • The SCAPP tool was developed to improve plasmid sequence assembly from metagenomic data by using biological insights, and it showed better performance compared to existing tools in various tests.
  • SCAPP is a user-friendly, open-source Python package that successfully assembles full plasmid sequences and has identified new, clinically relevant plasmids, making it a valuable resource for researchers.
View Article and Find Full Text PDF

Many bacteria contain plasmids, but separating between contigs that originate on the plasmid and those that are part of the bacterial genome can be difficult. This is especially true in metagenomic assembly, which yields many contigs of unknown origin. Existing tools for classifying sequences of plasmid origin give less reliable results for shorter sequences, are trained using a fraction of the known plasmids, and can be difficult to use in practice.

View Article and Find Full Text PDF

With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks.

View Article and Find Full Text PDF

Motivation: The minimizers scheme is a method for selecting k -mers from sequences. It is used in many bioinformatics software tools to bin comparable sequences or to sample a sequence in a deterministic fashion at approximately regular intervals, in order to reduce memory consumption and processing time. Although very useful, the minimizers selection procedure has undesirable behaviors (e.

View Article and Find Full Text PDF

Using a sequence's k-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As k-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for k-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of k-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR).

View Article and Find Full Text PDF

In this review, we provide an introduction to the topics of environmental justice and environmental inequality. We provide an overview of the dimensions of unequal exposures to environmental pollution (environmental inequality), followed by a discussion of the theoretical literature that seeks to explain the origins of this phenomenon. We also consider the impact of the environmental justice movement in the United States and the role that federal and state governments have developed to address environmental inequalities.

View Article and Find Full Text PDF