Clustering metagenomic sequences with interpolated Markov models.

BMC Bioinformatics

Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, College Park, MD 20742, USA.

Published: November 2010

AI Article Synopsis

Article Abstract

Background: Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects.

Results: We present SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that performs better when evolutionarily close training genomes are available.

Conclusions: SCIMM and PHYSCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHYSCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHYSCIMM are available open source from http://www.cbcb.umd.edu/software/scimm.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098094PMC
http://dx.doi.org/10.1186/1471-2105-11-544DOI Listing

Publication Analysis

Top Keywords

metagenomic sequences
8
interpolated markov
8
markov models
8
sequence clustering
8
supervised learning
8
scimm physcimm
8
clustering
6
scimm
6
clustering metagenomic
4
sequences interpolated
4

Similar Publications

To identify potential sources of hookworm infections in a Ghanaian community of endemicity that could be targeted to interrupt transmission, we tracked the movements of infected and noninfected persons to their most frequented locations. Fifty-nine participants (29 hookworm positives and 30 negatives) wore GPS trackers for 10 consecutive days. Their movement data were captured in real time and overlaid on a community grid map.

View Article and Find Full Text PDF

Background: Although bariatric and metabolic surgical methods, including duodenal-jejunal bypass (DJB), were shown to improve metabolic dysfunction-associated steatotic liver disease (MASLD) in clinical trials and experimental rodent models, their underlying mechanisms remain unclear. The present study therefore evaluated the therapeutic effects and mechanisms of action of DJB in rats with MASLD.

Methods: Rats with MASLD were randomly assigned to undergo DJB or sham surgery.

View Article and Find Full Text PDF

Decontamination of DNA sequences from a Streptomyces genome for optimal genome mining.

Braz J Microbiol

January 2025

Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo (USP), São Paulo, SP, 05508-900, Brazil.

Despite meticulous precautions, contamination of genomic DNA samples is not uncommon, which can significantly compromise the analysis of microorganisms' whole-genome sequencing data, thus affecting all subsequent analyses. Thanks to advancements in software and bioinformatics techniques, it is now possible to address this issue and prevent the loss of the entire dataset obtained in a contaminated whole-genome sequencing, where the DNA of another bacterium is present. In this study, it was observed that the sequencing reads from Streptomyces sp.

View Article and Find Full Text PDF

sp. nov., a novel endophytic bacterium with plant growth-promoting potential, isolated from root nodules of in Northwestern Algeria.

Int J Syst Evol Microbiol

January 2025

Dpartement de Biotechnologie, Laboratoire des Productions, Valorisations Vgtales et Microbiennes (LP2VM), Facult des Sciences de la Nature et de la Vie, B.P. 1505, El-Mnaour, Universit des Sciences et de la Technologie dOran Mohamed Boudiaf USTO-MB, Oran 31000, Algeria.

A thorough polyphasic taxonomic study, integrating genome-based taxonomic approaches, was carried out to characterize the RB5 strain isolated from root nodules of growing on the coastal dunes of Bousfer Beach (Oran, Algeria). The 16S rRNA gene sequence analysis revealed that strain RB5 had the highest similarity to LMG27940 (98.94%) and IzPS32d (98.

View Article and Find Full Text PDF

Pleural infections are common and associated with substantial healthcare costs, morbidity, and mortality. Accurate diagnosis remains challenging due to low culture positivity rates, frequent polymicrobial involvement, and non-specific diagnostic biomarkers. Here, we undertook a prospective study examining the feasibility and performance of molecular methods for diagnosing suspected pleural infection.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!