FastRNA: An efficient solution for PCA of single-cell RNA-sequencing data based on a batch-accounting count model.

Am J Hum Genet

Department of Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea; Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea; Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea; Genealogy Inc., Seoul, Republic of Korea. Electronic address:

Published: November 2022

Almost always, the analysis of single-cell RNA-sequencing (scRNA-seq) data begins with the generation of the low dimensional embedding of the data by principal-component analysis (PCA). Because scRNA-seq data are count data, log transformation is routinely applied to correct skewness prior to PCA, which is often argued to have added bias to data. Alternatively, studies have proposed methods that directly assume a count model and use approximately normally distributed count residuals for PCA. Despite their theoretical advantage of directly modeling count data, these methods are extremely slow for large datasets. In fact, when the data size grows, even the standard log normalization becomes inefficient. Here, we present FastRNA, a highly efficient solution for PCA of scRNA-seq data based on a count model accounting for both batches and cell size factors. Although we assume the same general count model as previous methods, our method uses two orders of magnitude less time and memory than the other count-based methods and an order of magnitude less time and memory than the standard log normalization. This achievement results from our unique algebraic optimization that completely avoids the formation of the large dense residual matrix in memory. In addition, our method enjoys a benefit that the batch effects are eliminated from data prior to PCA. Generating a batch-accounted PC of an atlas-scale dataset with 2 million cells takes less than a minute and 1 GB memory with our method.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9674949PMC
http://dx.doi.org/10.1016/j.ajhg.2022.09.008DOI Listing

Publication Analysis

Top Keywords

count model
16
scrna-seq data
12
data
10
efficient solution
8
solution pca
8
single-cell rna-sequencing
8
data based
8
pca scrna-seq
8
count data
8
prior pca
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!