Selecting single cell clustering parameter values using subsampling-based robustness metrics.

BMC Bioinformatics

Department of Neurology, Center for Translational and Computational Neuroimmunology, Columbia University, New York City, NY, USA.

Published: February 2021

Background: Generating and analysing single-cell data has become a widespread approach to examine tissue heterogeneity, and numerous algorithms exist for clustering these datasets to identify putative cell types with shared transcriptomic signatures. However, many of these clustering workflows rely on user-tuned parameter values, tailored to each dataset, to identify a set of biologically relevant clusters. Whereas users often develop their own intuition as to the optimal range of parameters for clustering on each data set, the lack of systematic approaches to identify this range can be daunting to new users of any given workflow. In addition, an optimal parameter set does not guarantee that all clusters are equally well-resolved, given the heterogeneity in transcriptomic signatures in most biological systems.

Results: Here, we illustrate a subsampling-based approach (chooseR) that simultaneously guides parameter selection and characterizes cluster robustness. Through bootstrapped iterative clustering across a range of parameters, chooseR was used to select parameter values for two distinct clustering workflows (Seurat and scVI). In each case, chooseR identified parameters that produced biologically relevant clusters from both well-characterized (human PBMC) and complex (mouse spinal cord) datasets. Moreover, it provided a simple "robustness score" for each of these clusters, facilitating the assessment of cluster quality.

Conclusion: chooseR is a simple, conceptually understandable tool that can be used flexibly across clustering algorithms, workflows, and datasets to guide clustering parameter selection and characterize cluster robustness.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7852188PMC
http://dx.doi.org/10.1186/s12859-021-03957-4DOI Listing

Publication Analysis

Top Keywords

parameter values
12
clustering
8
clustering parameter
8
transcriptomic signatures
8
clustering workflows
8
biologically relevant
8
relevant clusters
8
range parameters
8
parameter selection
8
cluster robustness
8

Similar Publications

Severe acute pancreatitis (SAP) is one of the leading causes of hospital admissions for gastrointestinal diseases, with a rising incidence worldwide. Intestinal microbiota dysbiosis caused by SAP exacerbates systemic inflammatory response syndrome and organ dysfunction. Fecal microbiota transplantation (FMT) has emerged as a promising therapeutic option for gastrointestinal diseases.

View Article and Find Full Text PDF

Purpose: Acute fatty liver of pregnancy (AFLP) is a severe complication that can occur in the third trimester or immediately postpartum, characterized by rapid hepatic failure. This study aims to explore the changes in portal vein blood flow velocity and liver function during pregnancy, which may assist in the early diagnosis and management of AFLP.

Methods: This longitudinal study was conducted at a tertiary healthcare center with participants recruited from routine antenatal check-ups.

View Article and Find Full Text PDF

Introduction Acute appendicitis is a common surgical emergency that requires a timely and accurate diagnosis to prevent complications. Several laboratory markers have been assessed to improve the diagnostic accuracy of acute appendicitis, including C-reactive protein (CRP), white blood cell (WBC) count, and cytokines like interleukins and tumor necrosis factor-alpha. One less commonly used but potentially valuable marker is the mean platelet volume (MPV), which indicates the size of circulating platelets and has the potential to serve as a biomarker for inflammatory conditions.

View Article and Find Full Text PDF

Unlabelled: Evaluating tissue microstructure and membrane integrity in the living human brain through diffusion-water exchange imaging is challenging due to requirements for a high signal-to-noise ratio and short diffusion times dictated by relatively fast exchange processes. The goal of this work was to demonstrate the feasibility of imaging of tissue micro-geometries and water exchange within the brain gray matter using the state-of-the-art Connectome 2.0 scanner equipped with an ultra-high-performance gradient system (maximum gradient strength=500 mT/m, maximum slew rate=600 T/m/s).

View Article and Find Full Text PDF

Background: Obstetric fistula is a significant cause of maternal morbidity in resource-limited settings, where women often suffer due to a lack of prompt access to skilled obstetric services. It is imperative to comprehend and identify the factors that shape community knowledge about obstetric fistula to enhance prevention strategies, enable early detection, and provide support and treatment to affected women. However, there is a substantial gap in the available evidence concerning the level of community knowledge regarding obstetric fistula and its influencing factors within the Ethiopian context.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!