Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.

BMC Bioinformatics

Charles Perkins Centre, School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Sydney, NSW 2006, Australia.

Published: December 2019

Background: Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification.

Results: Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used.

Conclusions: Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929272PMC
http://dx.doi.org/10.1186/s12859-019-3179-5DOI Listing

Publication Analysis

Top Keywords

autoencoder-based cluster
16
cell type
12
cell type-specific
12
cluster ensemble
12
data analysis
8
scrna-seq data
8
cell
8
type identification
8
genes cell
8
type-specific clusters
8

Similar Publications

Multimorbidity, the co-occurrence of multiple chronic conditions within the same individual, is increasing globally. This is a challenge for the single patients, as these individuals are subject to a heavy disease and treatment burden, yet evidence on the epidemiology and consequences of multimorbidity remains underexplored. Historically, studies aiming to understand multimorbidity patterns predominantly utilized cross-sectional data, neglecting the essential temporal dynamics which shape multimorbidity progression.

View Article and Find Full Text PDF

Hepatocellular carcinoma (HCC) ranks fourth in cancer-related mortality worldwide. This study aims to uncover the genes and pathways involved in HCC through network pharmacology (NP) and to discover potential drugs via machine learning (ML)-based ligand screening. Additionally, toxicity prediction, molecular docking, and molecular dynamics (MD) simulations were conducted.

View Article and Find Full Text PDF

scSFCL:Deep clustering of scRNA-seq data with subspace feature confidence learning.

Comput Biol Chem

February 2025

School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong 266520, China.

The rapid development of single-cell RNA sequencing(scRNA-seq) technology has spawned a variety of single-cell clustering methods. These methods combine statistics and bioinformatics to reveal differences in gene expression between cells and the diversity of cell types. Deep exploration of single-cell data is more challenging due to the high dimensionality, sparsity and noise of scRNA-seq data.

View Article and Find Full Text PDF
Article Synopsis
  • - Cell-cell communication is essential for normal biological functions, development, and immune responses, and advancements in single-cell RNA sequencing and spatial transcriptomics have enhanced analysis in this area, despite challenges like incomplete data.
  • - Current methods often overlook communication across different tissue layers and don’t fully capture the complexity of three-dimensional tissues.
  • - To overcome these limitations, the study introduces VGAE-CCI, a deep learning framework that accurately identifies cell-cell communication in complex tissues, exhibiting superior performance compared to existing methods across several datasets.
View Article and Find Full Text PDF

Dimensionality reduction is a critical step for the efficacy and efficiency of clustering analysis. Despite the multiple available methods, biomechanists have often defaulted to Principal Component Analysis (PCA). We evaluated two PCA- and one autoencoder-based dimensionality reduction methods for their data compression and reconstruction capability, assessed their effect on the output of clustering runners' based on kinematics, and discussed their implications for the biomechanical assessment of running technique.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!