Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge.
Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample.
Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat
Contact: allen99@llnl.gov
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3753567 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btt389 | DOI Listing |
Bioinformatics
January 2025
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.
Summary: In recent years there has been a surge in prokaryotic genome assemblies, coming from both isolated organisms and environmental samples. These assemblies often include novel species that are poorly represented in reference databases creating a need for a tool that can annotate both well-described and novel taxa, and can run at scale. Here, we present mettannotator-a comprehensive, scalable Nextflow pipeline for prokaryotic genome annotation that identifies coding and non-coding regions, predicts protein functions, including antimicrobial resistance, and delineates gene clusters.
View Article and Find Full Text PDFNat Commun
January 2025
Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark Kgs., Lyngby, Denmark.
The gut microbiome significantly impacts human health, yet cultivation challenges hinder its exploration. Here, we combine deep whole-metagenome sequencing with culturomics to selectively enrich for taxa and functional capabilities of interest. Using a modified commercial base medium, 50 growth modifications were evaluated, spanning antibiotics, physico-chemical conditions, and bioactive compounds.
View Article and Find Full Text PDFEcotoxicol Environ Saf
January 2025
Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran. Electronic address:
As a global environmental challenge, plastic pollution raises serious ecological and health concerns owing to the excessive accumulation of plastic waste, which disrupts ecosystems, harms wildlife, and threatens human health. Polyethylene terephthalate (PET), one of the most commonly used plastics, has contributed significantly to this growing crisis. This study offers a solution for plastic pollution by identifying novel PET-degrading enzymes.
View Article and Find Full Text PDFEnviron Monit Assess
December 2024
Department of Plant Pathology and Entomology, VIT-School of Agricultural Innovation and Advanced Learning, Vellore Institute of Technology, 632014, Vellore, Tamil Nadu, India.
Contamination by heavy metals (HMs) poses a significant threat to the ecosystem and its associated micro and macroorganisms, leading to ill effects on humans which necessitate the requirement of effective remediation strategies. Microbial remediation leverages the natural metabolic abilities of microbes to overcome heavy metal pollution effectively. Some of the mechanisms that aids in the removal of heavy metals includes bioaccumulation, biosorption, and biomineralization.
View Article and Find Full Text PDFiScience
December 2024
University Rennes, Inria, CNRS, IRISA - UMR 6074, 35000 Rennes, France.
Genomic data sequencing is crucial for understanding biological systems. As genomic databases like the European Nucleotide Archive expand exponentially, efficient data manipulation is essential. A key challenge is querying these databases to determine the presence or absence of specific sequences and their abundance within datasets.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!