SCHNEL: scalable clustering of high dimensional single-cell data.

Tamim Abdelaal Paul de Raadt Boudewijn P F Lelieveldt Marcel J T Reinders Ahmed Mahfouz

Bioinformatics

Delft Bioinformatics Lab, Delft University of Technology, 2628 XE Delft, The Netherlands.

Published: December 2020

Motivation: Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets.

Results: We developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable time frames. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST.

Availability And Implementation: Implementation is available on GitHub (https://github.com/biovault/SCHNELpy). All datasets used in this study are publicly available.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF	Source
http://dx.doi.org/10.1093/bioinformatics/btaa816	DOI Listing

Publication Analysis

Top Keywords

data

millions cells

clustering tools

graph clustering

clustering

schnel scalable

single-cell data

single cell

cell data

clustering tool

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered