scDiffusion: conditional generation of high-quality single-cell data using diffusion model.

Bioinformatics

MOE Key Lab of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China.

Published: September 2024

Motivation: Single-cell RNA sequencing (scRNA-seq) data are important for studying the laws of life at single-cell level. However, it is still challenging to obtain enough high-quality scRNA-seq data. To mitigate the limited availability of data, generative models have been proposed to computationally generate synthetic scRNA-seq data. Nevertheless, the data generated with current models are not very realistic yet, especially when we need to generate data with controlled conditions. In the meantime, diffusion models have shown their power in generating data with high fidelity, providing a new opportunity for scRNA-seq generation.

Results: In this study, we developed scDiffusion, a generative model combining the diffusion model and foundation model to generate high-quality scRNA-seq data with controlled conditions. We designed multiple classifiers to guide the diffusion process simultaneously, enabling scDiffusion to generate data under multiple condition combinations. We also proposed a new control strategy called Gradient Interpolation. This strategy allows the model to generate continuous trajectories of cell development from a given cell state. Experiments showed that scDiffusion could generate single-cell gene expression data closely resembling real scRNA-seq data. Also, scDiffusion can conditionally produce data on specific cell types including rare cell types. Furthermore, we could use the multiple-condition generation of scDiffusion to generate cell type that was out of the training data. Leveraging the Gradient Interpolation strategy, we generated a continuous developmental trajectory of mouse embryonic cells. These experiments demonstrate that scDiffusion is a powerful tool for augmenting the real scRNA-seq data and can provide insights into cell fate research.

Availability And Implementation: scDiffusion is openly available at the GitHub repository https://github.com/EperLuo/scDiffusion or Zenodo https://zenodo.org/doi/10.5281/zenodo.13268742.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368386PMC
http://dx.doi.org/10.1093/bioinformatics/btae518DOI Listing

Publication Analysis

Top Keywords

scrna-seq data
24
data
15
scdiffusion generate
12
scdiffusion
8
diffusion model
8
high-quality scrna-seq
8
generate data
8
data controlled
8
controlled conditions
8
model generate
8

Similar Publications

Plants lack specialized and mobile immune cells. Consequently, any cell type that encounters pathogens must mount immune responses and communicate with surrounding cells for successful defence. However, the diversity, spatial organization and function of cellular immune states in pathogen-infected plants are poorly understood.

View Article and Find Full Text PDF

Benchmarking cross-species single-cell RNA-seq data integration methods: towards a cell type tree of life.

Nucleic Acids Res

January 2025

BioEngineering Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

Cross-species single-cell RNA-seq data hold immense potential for unraveling cell type evolution and transferring knowledge between well-explored and less-studied species. However, challenges arise from interspecific genetic variation, batch effects stemming from experimental discrepancies and inherent individual biological differences. Here, we benchmarked nine data-integration methods across 20 species, encompassing 4.

View Article and Find Full Text PDF

Background: IgA nephropathy (IgAN) is a leading cause of renal failure, but its pathogenesis remains unclear, complicating diagnosis and treatment. The invasive nature of renal biopsy highlights the need for non-invasive diagnostic biomarkers. Bulk RNA sequencing (RNA-seq) of urine offers a promising approach for identifying molecular changes relevant to IgAN.

View Article and Find Full Text PDF

Background: Bioinformatics is fundamental to biomedical sciences, but its mastery presents a steep learning curve for bench biologists and clinicians. Learning to code while analyzing data is difficult. The curve may be flattened by separating these two aspects and providing intermediate steps for budding bioinformaticians.

View Article and Find Full Text PDF

Importance: As an accessible part of the central nervous system, the retina provides a unique window to study pathophysiological mechanisms of brain disorders in humans. Imaging and electrophysiological studies have revealed retinal alterations across several neuropsychiatric and neurological disorders, but it remains largely unclear which specific cell types and biological mechanisms are involved.

Objective: To determine whether specific retinal cell types are affected by genomic risk for neuropsychiatric and neurological disorders and to explore the mechanisms through which genomic risk converges in these cell types.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!