Scalable metagenomic taxonomy classification using a reference genome database.

Bioinformatics

Center for Applied Scientific Computing, Lawrence Livermore National Laboratory and Global Security Directorate, P. O. Box 808, Livermore, CA 94551, USA.

Published: September 2013

Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge.

Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample.

Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat

Contact: allen99@llnl.gov

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3753567PMC
http://dx.doi.org/10.1093/bioinformatics/btt389DOI Listing

Publication Analysis

Top Keywords

scalable metagenomic
8
biological samples
8
taxonomic classification
8
classification
6
metagenomic taxonomy
4
taxonomy classification
4
classification reference
4
reference genome
4
genome database
4
database motivation
4

Similar Publications

mettannotator: a comprehensive and scalable Nextflow annotation pipeline for prokaryotic assemblies.

Bioinformatics

January 2025

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.

Summary: In recent years there has been a surge in prokaryotic genome assemblies, coming from both isolated organisms and environmental samples. These assemblies often include novel species that are poorly represented in reference databases creating a need for a tool that can annotate both well-described and novel taxa, and can run at scale. Here, we present mettannotator-a comprehensive, scalable Nextflow pipeline for prokaryotic genome annotation that identifies coding and non-coding regions, predicts protein functions, including antimicrobial resistance, and delineates gene clusters.

View Article and Find Full Text PDF

The gut microbiome significantly impacts human health, yet cultivation challenges hinder its exploration. Here, we combine deep whole-metagenome sequencing with culturomics to selectively enrich for taxa and functional capabilities of interest. Using a modified commercial base medium, 50 growth modifications were evaluated, spanning antibiotics, physico-chemical conditions, and bioactive compounds.

View Article and Find Full Text PDF

Metagenomic exploration and computational prediction of novel enzymes for polyethylene terephthalate degradation.

Ecotoxicol Environ Saf

January 2025

Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran. Electronic address:

As a global environmental challenge, plastic pollution raises serious ecological and health concerns owing to the excessive accumulation of plastic waste, which disrupts ecosystems, harms wildlife, and threatens human health. Polyethylene terephthalate (PET), one of the most commonly used plastics, has contributed significantly to this growing crisis. This study offers a solution for plastic pollution by identifying novel PET-degrading enzymes.

View Article and Find Full Text PDF

Harnessing microbes for heavy metal remediation: mechanisms and prospects.

Environ Monit Assess

December 2024

Department of Plant Pathology and Entomology, VIT-School of Agricultural Innovation and Advanced Learning, Vellore Institute of Technology, 632014, Vellore, Tamil Nadu, India.

Contamination by heavy metals (HMs) poses a significant threat to the ecosystem and its associated micro and macroorganisms, leading to ill effects on humans which necessitate the requirement of effective remediation strategies. Microbial remediation leverages the natural metabolic abilities of microbes to overcome heavy metal pollution effectively. Some of the mechanisms that aids in the removal of heavy metals includes bioaccumulation, biosorption, and biomineralization.

View Article and Find Full Text PDF

Genomic data sequencing is crucial for understanding biological systems. As genomic databases like the European Nucleotide Archive expand exponentially, efficient data manipulation is essential. A key challenge is querying these databases to determine the presence or absence of specific sequences and their abundance within datasets.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!