Motivation: Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging.

Results: There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity.

Availability: An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4619776PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0140644PLOS

Publication Analysis

Top Keywords

community composition
16
aggregation reads
12
bacterial community
12
reads k-means
8
estimation bacterial
8
sequence data
8
reads
6
ark aggregation
4
estimation
4
k-means estimation
4

Similar Publications

High diversity of fungal ecological groups from ice-free pristine and disturbed areas in the Fildes Peninsula, King George Island, Antarctica.

PLoS One

January 2025

Departamento de Química, Laboratorio de Química Aplicada y Sustentable (LabQAS), Universidad del Bío-Bío, Concepción, Chile.

Ice-free areas are habitats for most of Antarctica's terrestrial biodiversity. Although fungal communities are an important element of these habitats, knowledge of their assemblages and ecological functions is still limited. Herein, we investigated the diversity, composition, and ecological functionality of fungal communities inhabiting sediments from ice-free areas across pristine and anthropogenically impacted sites in the Fildes Peninsula on King George Island, Antarctica.

View Article and Find Full Text PDF

TSST-1 promotes colonization of within the vaginal tract by activation of CD8 T cells.

Infect Immun

January 2025

Department of Microbiology and Immunology, Schulich School of Medicine and Dentistry, University of Western Ontario, London, Ontario, Canada.

Toxic shock syndrome toxin-1 (TSST-1) is a superantigen produced by and is the determinant of menstrual toxic shock syndrome (mTSS); however, the impact of TSST-1 on the vaginal environment beyond mTSS is not understood. Herein, we assessed how TSST-1 affects vaginal colonization by , host inflammatory responses, and changes in microbial communities within the murine vagina. We demonstrated that TSST-1 induced a CD8 T-cell-dependent inflammatory response in 24 h that correlated with persistence within the vaginal tract.

View Article and Find Full Text PDF

As a diverse and complex food matrix, the animal food microbiota and repertoire of antimicrobial resistance (AMR) genes remain to be better understood. In this study, 16S rRNA gene amplicon sequencing and shotgun metagenomics were applied to three types of animal food samples (cattle feed, dry dog food, and poultry feed). ZymoBIOMICS mock microbial community was used for workflow optimization including DNA extraction kits and bead-beating conditions.

View Article and Find Full Text PDF

Unlabelled: Fish gut microbial communities are important for the breakdown and energy harvesting of the host diet. Microbes within the fish gut are selected by environmental and evolutionary factors. To understand how fish gut microbial communities are shaped by diet, three tropical fish species (hawkfish, ; yellow tang, ; and triggerfish, ) were fed piscivorous (fish meal pellets), herbivorous (seaweed), and invertivorous (shrimp) diets, respectively.

View Article and Find Full Text PDF

Advances in understanding dietary fiber: Classification, structural characterization, modification, and gut microbiome interactions.

Compr Rev Food Sci Food Saf

January 2025

Department of Food Science and Technology, Virginia Tech, Blacksburg, Virginia, USA.

Gut microbiota and their metabolites profoundly impact host physiology. Targeted modulation of gut microbiota has been a long-term interest in the scientific community. Numerous studies have investigated the feasibility of utilizing dietary fibers (DFs) to modulate gut microbiota and promote the production of health-beneficial bacterial metabolites.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!