Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data.

Tian Tian Jie Zhang Xiang Lin Zhi Wei Hakon Hakonarson

Nat Commun

Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.

Published: March 2021

Clustering is a critical step in single cell-based studies. Most existing methods support unsupervised clustering without the a priori exploitation of any domain knowledge. When confronted by the high dimensionality and pervasive dropout events of scRNA-Seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicates cell type assignment. In such cases, the only recourse is for the user to manually and repeatedly tweak clustering parameters until acceptable clusters are found. Consequently, the path to obtaining biologically meaningful clusters can be ad hoc and laborious. Here we report a principled clustering method named scDCC, that integrates domain knowledge into the clustering step. Experiments on various scRNA-seq datasets from thousands to tens of thousands of cells show that scDCC can significantly improve clustering performance, facilitating the interpretability of clusters and downstream analyses, such as cell type assignment.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7994574	PMC
http://dx.doi.org/10.1038/s41467-021-22008-3	DOI Listing

Publication Analysis

Top Keywords

clustering

unsupervised clustering

domain knowledge

cell type

type assignment

model-based deep

deep embedding

embedding constrained

constrained clustering

clustering analysis

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!