High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology-a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4783016PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0151064PLOS

Publication Analysis

Top Keywords

16s rrna
20
sequence datasets
12
in-memory data
8
clustering 16s
8
rrna sequence
8
microbial diversity
8
human microbiome
8
nodes amazon
8
amazon ec2
8
ec2 cloud-computing
8

Similar Publications

Identification and Characterization of a Protease Producing Strain From Tannery Waste for Efficient Dehairing of Goat Skin.

Biomed Res Int

January 2025

Center for Personalized Nanomedicine, Australian Institute for Bioengineering & Nanotechnology (AIBN), The University of Queensland, Brisbane, Queensland, Australia.

Environmental pollution has been a significant concern for the last few years. The leather industry significantly contributes to the economy but is one of Bangladesh's most prominent polluting industries. It is also responsible for several severe diseases such as cancer, lung diseases, and heart diseases of leather workers because they use bleaching agents and chemicals, and these have numerous adverse effects on human health.

View Article and Find Full Text PDF

Explore Alteration of Lung and Gut Microbiota in a Murine Model of OVA-Induced Asthma Treated by CpG Oligodeoxynucleotides.

J Inflamm Res

January 2025

Department of Geriatric Respiratory and Critical Care, The First Affiliated Hospital of Anhui Medical University, Anhui Geriatric Institute, Hefei, Anhui, People's Republic of China.

Aim: We sought to investigate the impact of CpG oligodeoxynucleotides (CpG-ODN) administration on the lung and gut microbiota in asthmatic mice, specifically focusing on changes in composition, diversity, and abundance, and to elucidate the microbial mechanisms underlying the therapeutic effects of CpG-ODN and identify potential beneficial bacteria indicative of its efficacy.

Methods: HE staining were used to analyze inflammation in lung, colon and small intestine tissues. High-throughput sequencing technology targeting 16S rRNA was employed to analyze the composition, diversity, and correlation of microbiome in the lung, colon and small intestine of control, model and CpG-ODN administration groups.

View Article and Find Full Text PDF

Building a reliable 16S mini-barcode library of wild bees from Occitania, south-west of France.

Biodivers Data J

January 2025

Dynafor, INRAE, INP, ENSAT, 31326, Castanet Tolosan, France Dynafor, INRAE, INP, ENSAT, 31326 Castanet Tolosan France.

Background: DNA barcoding and metabarcoding are now powerful tools for studying biodiversity and especially the accurate identification of large sample collections belonging to diverse taxonomic groups. Their success depends largely on the taxonomic resolution of the DNA sequences used as barcodes and on the reliability of the reference databases. For wild bees, the barcode sequences coverage is consistently growing in volume, but some incorrect species annotations need to be cared for.

View Article and Find Full Text PDF

Objective: Our research has pinpointed the gut microbiome's role in the progression of various pathological types of non-small cell lung cancer (NSCLC). Nonetheless, the characteristics of the gut microbiome and its metabolites across different clinical stages of NSCLC are yet to be fully understood. The current study seeks to explore the distinctive gut flora and metabolite profiles of NSCLC patients across varying TNM stages.

View Article and Find Full Text PDF

The presence of antibiotic-resistant bacteria at four Norwegian wastewater treatment plants: seasonal and wastewater-source effects.

Front Antibiot

February 2024

Department of Chemistry, Bioscience and Environmental Engineering, Faculty of Science and Technology, University of Stavanger, Stavanger, Norway.

Wastewater treatment plants receive low concentrations of antibiotics. Residual concentrations of antibiotics in the effluent may accelerate the development of antibiotic resistance in the receiving environments. Monitoring of antimicrobial resistance genes (ARGs) in countries with strict regulation of antibiotic use is important in gaining knowledge of how effective these policies are in preventing the emergence of ARGs or whether other strategies are required, for example, at-source treatment of hospital effluents.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!