Background: Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification.
Results: Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used.
Conclusions: Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929272 | PMC |
http://dx.doi.org/10.1186/s12859-019-3179-5 | DOI Listing |
Comput Biol Med
January 2025
Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia.
Multimorbidity, the co-occurrence of multiple chronic conditions within the same individual, is increasing globally. This is a challenge for the single patients, as these individuals are subject to a heavy disease and treatment burden, yet evidence on the epidemiology and consequences of multimorbidity remains underexplored. Historically, studies aiming to understand multimorbidity patterns predominantly utilized cross-sectional data, neglecting the essential temporal dynamics which shape multimorbidity progression.
View Article and Find Full Text PDFSAR QSAR Environ Res
November 2024
Research and Development Center, Bioinnov Solutions LLP, Salem, India.
Hepatocellular carcinoma (HCC) ranks fourth in cancer-related mortality worldwide. This study aims to uncover the genes and pathways involved in HCC through network pharmacology (NP) and to discover potential drugs via machine learning (ML)-based ligand screening. Additionally, toxicity prediction, molecular docking, and molecular dynamics (MD) simulations were conducted.
View Article and Find Full Text PDFComput Biol Chem
February 2025
School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong 266520, China.
The rapid development of single-cell RNA sequencing(scRNA-seq) technology has spawned a variety of single-cell clustering methods. These methods combine statistics and bioinformatics to reveal differences in gene expression between cells and the diversity of cell types. Deep exploration of single-cell data is more challenging due to the high dimensionality, sparsity and noise of scRNA-seq data.
View Article and Find Full Text PDFBrief Bioinform
November 2024
School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China.
J Biomech
December 2024
Department for Health, University of Bath, Bath, UK. Electronic address:
Dimensionality reduction is a critical step for the efficacy and efficiency of clustering analysis. Despite the multiple available methods, biomechanists have often defaulted to Principal Component Analysis (PCA). We evaluated two PCA- and one autoencoder-based dimensionality reduction methods for their data compression and reconstruction capability, assessed their effect on the output of clustering runners' based on kinematics, and discussed their implications for the biomechanical assessment of running technique.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!