De novo clustering is a popular technique to perform taxonomic profiling of a microbial community by grouping 16S rRNA amplicon reads into operational taxonomic units (OTUs). In this work, we introduce a new dendrogram-based OTU clustering pipeline called CRiSPy. The key idea used in CRiSPy to improve clustering accuracy is the application of an anomaly detection technique to obtain a dynamic distance cutoff instead of using the de facto value of 97 percent sequence similarity as in most existing OTU clustering pipelines. This technique works by detecting an abrupt change in the merging heights of a dendrogram. To produce the output dendrograms, CRiSPy employs the OTU hierarchical clustering approach that is computed on a genetic distance matrix derived from an all-against-all read comparison by pairwise sequence alignment. However, most existing dendrogram-based tools have difficulty processing datasets larger than 10,000 unique reads due to high computational complexity. We address this difficulty by developing two efficient algorithms for CRiSPy: a compute-efficient GPU-accelerated parallel algorithm for pairwise distance matrix computation and a memory-efficient hierarchical clustering algorithm. Our experiments on various datasets with distinct attributes show that CRiSPy is able to produce more accurate OTU groupings than most OTU clustering applications.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2015.2407574DOI Listing

Publication Analysis

Top Keywords

otu clustering
16
accurate otu
8
clustering
8
sequence alignment
8
hierarchical clustering
8
distance matrix
8
otu
6
crispy
5
efficient accurate
4
clustering gpu-based
4

Similar Publications

Rapid advancements in long-read sequencing have facilitated species-level microbial profiling through full-length 16S rRNA sequencing (~ 1500 bp), and more notably, by the newer 16S-ITS-23S ribosomal RNA operon (RRN) sequencing (~ 4500 bp). RRN sequencing is emerging as a superior method for species resolution, exceeding the capabilities of short-read and full-length 16S rRNA sequencing. However, being in its early stages of development, RRN sequencing has several underexplored or understudied elements, highlighting the need for a critical and thorough examination of its methodologies.

View Article and Find Full Text PDF

OTUD6B regulates KIFC1-dependent centrosome clustering and breast cancer cell survival.

EMBO Rep

January 2025

Cellular and Molecular Physiology, Institute of Systems Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 3BX, UK.

Cancer cells often display centrosome amplification, requiring the kinesin KIFC1/HSET for centrosome clustering to prevent multipolar spindles and cell death. In parallel siRNA screens of deubiquitinase enzymes, we identify OTUD6B as a positive regulator of KIFC1 expression that is required for centrosome clustering in triple-negative breast cancer (TNBC) cells. OTUD6B can localise to centrosomes and the mitotic spindle and interacts with KIFC1.

View Article and Find Full Text PDF

Lysosomes are the major cellular organelles responsible for nutrient recycling and degradation of cellular material. Maintenance of lysosomal integrity is essential for cellular homeostasis and lysosomal membrane permeabilization (LMP) sensitizes toward cell death. Damaged lysosomes are repaired or degraded via lysophagy, during which glycans, exposed on ruptured lysosomal membranes, are recognized by galectins leading to K48- and K63-linked poly-ubiquitination (poly-Ub) of lysosomal proteins followed by recruitment of the macroautophagic/autophagic machinery and degradation.

View Article and Find Full Text PDF

Unlabelled: The diversity of bacteria associated with lichens has received increasing attention. However, studies based on next-generation sequencing of microbiomes have not yet been conducted in the Arctic and Subarctic regions. In this study, rock-dwelling lichens belonging to the Umbilicariaceae family were sampled from the Arctic and Subarctic biological zones.

View Article and Find Full Text PDF
Article Synopsis
  • Metabarcoding of the ITS region is widely used to study fungal communities, but the lack of standardized bioinformatic pipelines leads to varying results.
  • This study compared DADA2, which infers ASVs, and mothur, which clusters sequences into OTUs, revealing that mothur identified greater fungal richness and produced more consistent results across multiple samples.
  • The findings suggest that using a 97% similarity threshold for OTU clustering may be the best method for analyzing fungal metabarcoding data to reduce potential bias.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!