Feature selection followed by a novel residuals-based normalization that includes variance stabilization simplifies and improves single-cell gene expression analysis.

BMC Bioinformatics

Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA.

Published: July 2024

AI Article Synopsis

  • Normalization in single-cell RNA-sequencing (scRNA-seq) is vital for reducing technical biases in data analysis and preparing counts for statistical evaluation.
  • Researchers propose a new workflow that allows feature selection to happen before normalization, which helps identify both highly variable genes (HVGs) and stable genes.
  • The study introduces a new method of normalization based on stable genes, showing significant improvements in clustering analyses, and this approach is available in the R package Piccolo.

Article Abstract

Normalization is a crucial step in the analysis of single-cell RNA-sequencing (scRNA-seq) counts data. Its principal objectives are reduction of systematic biases primarily introduced through technical sources and transformation of counts to make them more amenable for the application of established statistical frameworks. In the standard workflows, normalization is followed by feature selection to identify highly variable genes (HVGs) that capture most of the biologically meaningful variation across the cells. Here, we make the case for a revised workflow by proposing a simple feature selection method and showing that we can perform feature selection before normalization by relying on observed counts. We highlight that the feature selection step can be used to not only select HVGs but to also identify stable genes. We further propose a novel variance stabilization transformation inclusive residuals-based normalization method that in fact relies on the stable genes to inform the reduction of systematic biases. We demonstrate significant improvements in downstream clustering analyses through the application of our proposed methods on biological truth-known as well as simulated counts datasets. We have implemented this novel workflow for analyzing high-throughput scRNA-seq data in an R package called Piccolo.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11290295PMC
http://dx.doi.org/10.1186/s12859-024-05872-wDOI Listing

Publication Analysis

Top Keywords

feature selection
20
residuals-based normalization
8
variance stabilization
8
reduction systematic
8
systematic biases
8
stable genes
8
feature
5
normalization
5
selection novel
4
novel residuals-based
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!