Publications by authors named "Jim Shaw"

Profiling metagenomes against databases allows for the detection and quantification of microorganisms, even at low abundances where assembly is not possible. We introduce sylph, a species-level metagenome profiler that estimates genome-to-metagenome containment average nucleotide identity (ANI) through zero-inflated Poisson k-mer statistics, enabling ANI-based taxa detection. On the Critical Assessment of Metagenome Interpretation II (CAMI2) Marine dataset, sylph was the most accurate profiling method of seven tested.

View Article and Find Full Text PDF

Unlabelled: Motivation: Mobile genetic elements (MGEs) are as ubiquitous in nature as they are varied in type, ranging from viral insertions to transposons to incorporated plasmids. Horizontal transfer of MGEs across bacterial species may also pose a significant threat to global health due to their capability to harbor antibiotic resistance genes. However, despite cheap and rapid whole-genome sequencing, the varied nature of MGEs makes it difficult to fully characterize them, and existing methods for detecting MGEs often do not agree on what should count.

View Article and Find Full Text PDF

Background: Metagenomic binning, the clustering of assembled contigs that belong to the same genome, is a crucial step for recovering metagenome-assembled genomes (MAGs). Contigs are linked by exploiting consistent signatures along a genome, such as read coverage patterns. Using coverage from multiple samples leads to higher-quality MAGs; however, standard pipelines require all-to-all read alignments for multiple samples to compute coverage, becoming a key computational bottleneck.

View Article and Find Full Text PDF

Summary: Shotgun metagenomics allows for direct analysis of microbial community genetics, but scalable computational methods for the recovery of bacterial strain genomes from microbiomes remains a key challenge. We introduce Floria, a novel method designed for rapid and accurate recovery of strain haplotypes from short and long-read metagenome sequencing data, based on minimum error correction (MEC) read clustering and a strain-preserving network flow model. Floria can function as a standalone haplotyping method, outputting alleles and reads that co-occur on the same strain, as well as an end-to-end read-to-assembly pipeline (Floria-PL) for strain-level assembly.

View Article and Find Full Text PDF

Background: Taxonomic classification of reads obtained by metagenomic sequencing is often a first step for understanding a microbial community, but correctly assigning sequencing reads to the strain or sub-species level has remained a challenging computational problem.

Results: We introduce Mora, a MetagenOmic read Re-Assignment algorithm capable of assigning short and long metagenomic reads with high precision, even at the strain level. Mora is able to accurately re-assign reads by first estimating abundances through an expectation-maximization algorithm and then utilizing abundance information to re-assign query reads.

View Article and Find Full Text PDF

Sequence comparison tools for metagenome-assembled genomes (MAGs) struggle with high-volume or low-quality data. We present skani ( https://github.com/bluenote-1577/skani ), a method for determining average nucleotide identity (ANI) via sparse approximate alignments.

View Article and Find Full Text PDF

Objectives: Patient-reported outcome (PRO) data are critical in understanding treatments from the patient perspective in cancer clinical trials. The potential benefits and methodological approaches to the collection of PRO data after treatment discontinuation (eg, because of progressive disease or unacceptable drug toxicity) are less clear. The purpose of this article is to describe the Food and Drug Administration's Oncology Center of Excellence and the Critical Path Institute cosponsored 2-hour virtual roundtable, held in 2020, to discuss this specific issue.

View Article and Find Full Text PDF

Seed-chain-extend with -mer seeds is a powerful heuristic technique for sequence alignment used by modern sequence aligners. Although effective in practice for both runtime and accuracy, theoretical guarantees on the resulting alignment do not exist for seed-chain-extend. In this work, we give the first rigorous bounds for the efficacy of seed-chain-extend with -mers Assume we are given a random nucleotide sequence of length ∼ that is indexed (or seeded) and a mutated substring of length ∼ ≤ with mutation rate θ < 0.

View Article and Find Full Text PDF

Motivation: We face an increasing flood of genetic sequence data, from diverse sources, requiring rapid computational analysis. Rapid analysis can be achieved by sampling a subset of positions in each sequence. Previous sequence-sampling methods, such as minimizers, syncmers and minimally overlapping words, were developed by heuristic intuition, and are not optimal.

View Article and Find Full Text PDF

Motivation: Selecting a subset of k-mers in a string in a local manner is a common task in bioinformatics tools for speeding up computation. Arguably the most well-known and common method is the minimizer technique, which selects the 'lowest-ordered' k-mer in a sliding window. Recently, it has been shown that minimizers may be a sub-optimal method for selecting subsets of k-mers when mutations are present.

View Article and Find Full Text PDF

Resolving haplotypes in polyploid genomes using phase information from sequencing reads is an important and challenging problem. We introduce two new mathematical formulations of polyploid haplotype phasing: (1) the min-sum max tree partition problem, which is a more flexible graphical metric compared with the standard minimum error correction (MEC) model in the polyploid setting, and (2) the uniform probabilistic error minimization model, which is a probabilistic analogue of the MEC model. We incorporate both formulations into a long-read based polyploid haplotype phasing method called .

View Article and Find Full Text PDF
Article Synopsis
  • High-throughput metagenomic sequencing has advanced microbiome characterization, but current methods struggle to merge short- and long-read technologies effectively.!* -
  • The new hybrid assembler OPERA-MS improves assembly accuracy, contiguity, and reduces errors compared to existing long- and short-read assemblers, enabling better strain resolution and genome assembly for complex microbial communities.!* -
  • Using OPERA-MS, researchers produced high-quality assemblies from gut metagenomes of antibiotic-treated patients, significantly improving assembly quality and revealing detailed insights into the gut resistome, including new phage sequences.!*
View Article and Find Full Text PDF

Physical employment standards evaluate whether a worker possesses the physical abilities to safely and efficiently perform all critical on-the-job tasks. Initial Attack (IA) wildland fire fighters (WFF) must perform such critical tasks in all terrains. Following a physical demands analysis, IA WFF (n = 946 out of a possible 965) from all fire jurisdictions ranked the most demanding tasks and identified mountains, muskeg and rolling hills as the most challenging terrains.

View Article and Find Full Text PDF

The purpose of this investigation was to identify the critical tasks encountered by correctional officers (COs) on the job and to conduct a comprehensive assessment and characterization of the physical demands of these tasks. These are the first steps in developing a fitness screening test for COs in compliance with recent legislation. The most important, physically demanding, and frequently occurring tasks were identified using Delphi methodology, focus groups, and questionnaire responses from 190 experienced front-line COs.

View Article and Find Full Text PDF

Introduction: The purpose of this study was to characterize the physiological demands of recreational off-road vehicle riding under typical riding conditions using habitual recreational off-road vehicle riders (n = 128).

Methods: Comparisons of the physical demands of off-road vehicle riding were made between vehicle types (all-terrain vehicle (ATV) and off-road motorcycle (ORM)) to the demands of common recreational activities. Habitual riders (ATV = 56, ORM = 72) performed strength assessments before and after a representative trail ride (48 +/- 24.

View Article and Find Full Text PDF

The papers in presentation group 2 of Genetic Analysis Workshop 15 (GAW15) conducted association analyses of rheumatoid arthritis data. The analyses were carried out primarily in the data provided by the North American Rheumatoid Arthritis Consortium (NARAC). One group conducted analyses in the data provided by the Canadian Rheumatoid Arthritis Genetics Study (CRAGS).

View Article and Find Full Text PDF