Inference after latent variable estimation for single-cell RNA sequencing data.

Biostatistics

Department of Statistics, University of Washington, Seattle, WA 98195, USA and Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.

Published: December 2023

In the analysis of single-cell RNA sequencing data, researchers often characterize the variation between cells by estimating a latent variable, such as cell type or pseudotime, representing some aspect of the cell's state. They then test each gene for association with the estimated latent variable. If the same data are used for both of these steps, then standard methods for computing p-values in the second step will fail to achieve statistical guarantees such as Type 1 error control. Furthermore, approaches such as sample splitting that can be applied to solve similar problems in other settings are not applicable in this context. In this article, we introduce count splitting, a flexible framework that allows us to carry out valid inference in this setting, for virtually any latent variable estimation technique and inference approach, under a Poisson assumption. We demonstrate the Type 1 error control and power of count splitting in a simulation study and apply count splitting to a data set of pluripotent stem cells differentiating to cardiomyocytes.

Download full-text PDF

Source
http://dx.doi.org/10.1093/biostatistics/kxac047DOI Listing

Publication Analysis

Top Keywords

latent variable
16
count splitting
12
variable estimation
8
single-cell rna
8
rna sequencing
8
sequencing data
8
type error
8
error control
8
inference latent
4
variable
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!