SAE-Impute: imputation for single-cell data via subspace regression and auto-encoders.

BMC Bioinformatics

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.

Published: October 2024

Background: Single-cell RNA sequencing (scRNA-seq) technology has emerged as a crucial tool for studying cellular heterogeneity. However, dropouts are inherent to the sequencing process, known as dropout events, posing challenges in downstream analysis and interpretation. Imputing dropout data becomes a critical concern in scRNA-seq data analysis. Present imputation methods predominantly rely on statistical or machine learning approaches, often overlooking inter-sample correlations.

Results: To address this limitation, We introduced SAE-Impute, a new computational method for imputing single-cell data by combining subspace regression and auto-encoders for enhancing the accuracy and reliability of the imputation process. Specifically, SAE-Impute assesses sample correlations via subspace regression, predicts potential dropout values, and then leverages these predictions within an autoencoder framework for interpolation. To validate the performance of SAE-Impute, we systematically conducted experiments on both simulated and real scRNA-seq datasets. These results highlight that SAE-Impute effectively reduces false negative signals in single-cell data and enhances the retrieval of dropout values, gene-gene and cell-cell correlations. Finally, We also conducted several downstream analyses on the imputed single-cell RNA sequencing (scRNA-seq) data, including the identification of differential gene expression, cell clustering and visualization, and cell trajectory construction.

Conclusions: These results once again demonstrate that SAE-Impute is able to effectively reduce the droupouts in single-cell dataset, thereby improving the functional interpretability of the data.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11443887PMC
http://dx.doi.org/10.1186/s12859-024-05944-xDOI Listing

Publication Analysis

Top Keywords

single-cell data
12
subspace regression
12
regression auto-encoders
8
single-cell rna
8
rna sequencing
8
sequencing scrna-seq
8
scrna-seq data
8
dropout values
8
sae-impute effectively
8
data
7

Similar Publications

Introduction: Macrophages exhibit marked phenotypic heterogeneity within and across disease states, with lipid metabolic reprogramming contributing to macrophage activation and heterogeneity. Chronic inflammation has been observed in human benign prostatic hyperplasia (BPH) tissues, however macrophage activation states and their contributions to this hyperplastic disease have not been defined. We postulated that a shift in macrophage phenotypes with increasing prostate size could involve metabolic alterations resulting in prostatic epithelial or stromal hyperplasia.

View Article and Find Full Text PDF

MITIGATING OVER-SATURATED FLUORESCENCE IMAGES THROUGH A SEMI-SUPERVISED GENERATIVE ADVERSARIAL NETWORK.

Proc IEEE Int Symp Biomed Imaging

May 2024

Department of Electrical and Computer Engineering, Nashville, TN, USA.

Multiplex immunofluorescence (MxIF) imaging is a critical tool in biomedical research, offering detailed insights into cell composition and spatial context. As an example, DAPI staining identifies cell nuclei, while CD20 staining helps segment cell membranes in MxIF. However, a persistent challenge in MxIF is saturation artifacts, which hinder single-cell level analysis in areas with over-saturated pixels.

View Article and Find Full Text PDF

Motivation: Since their introduction about 10 years ago, methylation clocks have provided broad insights into the biological age of different species, tissues, and in the context of several diseases or aging. However, their application to single-cell methylation data remains a major challenge, because of the inherent sparsity of such data, as many CpG sites are not covered. A methylation clock applicable on single-cell level could help to further disentangle the processes that drive the ticking of epigenetic clocks.

View Article and Find Full Text PDF

Attention-deficit/hyperactivity disorder (ADHD) is a highly heritable neurodevelopmental disorder, but its genetic architecture remains incompletely characterized. Rare coding variants, which can profoundly impact gene function, represent an underexplored dimension of ADHD risk. In this study, we analyzed large-scale DNA sequencing datasets from ancestrally diverse cohorts and observed significant enrichment of rare protein-truncating and deleterious missense variants in highly evolutionarily constrained genes.

View Article and Find Full Text PDF

While naïve CD4+ T cells have historically been considered a homogenous population, recent studies have provided evidence that functional heterogeneity exists within this population. Using single cell RNA sequencing (scRNAseq), we identify five transcriptionally distinct naïve CD4+ T cell subsets that emerge within the single positive stage in the thymus: a quiescence cluster (TQ), a memory-like cluster (TMEM), a TCR reactive cluster (TTCR), an IFN responsive cluster (TIFN), and an undifferentiated cluster (TUND). Elevated expression of transcription factors KLF2, Mx1, and Nur77 within the TQ, TIFN, and TMEM clusters, respectively, allowed enrichment of these subsets for further analyses.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!