Matrix prior for data transfer between single cell data types in latent Dirichlet allocation.

PLoS Comput Biol

Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America.

Published: May 2023

AI Article Synopsis

  • Single cell ATAC-seq (scATAC-seq) is a technique that maps regulatory elements in specific cell types, but analyzing the resulting data is complex and costly to generate on a large scale.
  • The study proposes using latent Dirichlet allocation (LDA), a Bayesian algorithm designed for textual data, to improve scATAC-seq analysis by treating cells like documents and accessible sites like words.
  • The research tests the effectiveness of using nonuniform matrix priors from existing LDA models, showing that this approach enhances the detection of cell types in smaller scATAC-seq datasets from both C. elegans and mouse skin cells.

Article Abstract

Single cell ATAC-seq (scATAC-seq) enables the mapping of regulatory elements in fine-grained cell types. Despite this advance, analysis of the resulting data is challenging, and large scale scATAC-seq data are difficult to obtain and expensive to generate. This motivates a method to leverage information from previously generated large scale scATAC-seq or scRNA-seq data to guide our analysis of new scATAC-seq datasets. We analyze scATAC-seq data using latent Dirichlet allocation (LDA), a Bayesian algorithm that was developed to model text corpora, summarizing documents as mixtures of topics defined based on the words that distinguish the documents. When applied to scATAC-seq, LDA treats cells as documents and their accessible sites as words, identifying "topics" based on the cell type-specific accessible sites in those cells. Previous work used uniform symmetric priors in LDA, but we hypothesized that nonuniform matrix priors generated from LDA models trained on existing data sets may enable improved detection of cell types in new data sets, especially if they have relatively few cells. In this work, we test this hypothesis in scATAC-seq data from whole C. elegans nematodes and SHARE-seq data from mouse skin cells. We show that nonsymmetric matrix priors for LDA improve our ability to capture cell type information from small scATAC-seq datasets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10191269PMC
http://dx.doi.org/10.1371/journal.pcbi.1011049DOI Listing

Publication Analysis

Top Keywords

scatac-seq data
12
data
10
single cell
8
latent dirichlet
8
dirichlet allocation
8
scatac-seq
8
cell types
8
large scale
8
scale scatac-seq
8
scatac-seq datasets
8

Similar Publications

scATAC-seq generates more accurate and complete regulatory maps than bulk ATAC-seq.

Sci Rep

January 2025

MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK.

Bulk ATAC-seq assays have been used to map and profile the chromatin accessibility of regulatory elements such as enhancers, promoters, and insulators. This has provided great insight into the regulation of gene expression in many cell types in a variety of organisms. To date, ATAC-seq has most often been used to provide an average evaluation of chromatin accessibility in populations of cells.

View Article and Find Full Text PDF

The mammalian nervous system controls complex functions through highly specialized and interacting structures. Single-cell sequencing can provide information on cell-type-specific chromatin structure and regulatory elements, revealing differences in chromatin organization between different cell types and their potential roles of these differences in brain function. Here, we generated a chromatin accessibility dataset through single-cell ATAC-seq of 174,593 high-quality nuclei from 16 adult rat brain regions.

View Article and Find Full Text PDF

Total proctocolectomy with ileal pouch anal anastomosis is the standard of care for patients with severe ulcerative colitis. We generated a cell-type-resolved transcriptional and epigenetic atlas of ileal pouches using scRNA-seq and scATAC-seq data from paired biopsy samples of the ileal pouch and the ileal segment above the pouch (pre-pouch) from patients (male=4, female=2), and paired biopsies of the terminal ileum and ascending colon from healthy individuals (male=3, female=3) serving as reference. Our study finds an additional population of absorptive and secretory epithelial cells within the pouch but not the pre-pouch.

View Article and Find Full Text PDF

The rapid advance of large-scale atlas-level single cell RNA sequences and single-cell chromatin accessibility data provide extraordinary avenues to broad and deep insight into complex biological mechanism. Leveraging the datasets and transfering labels from scRNA-seq to scATAC-seq will empower the exploration of single-cell omics data. However, the current label transfer methods have limited performance, largely due to the lower capable of preserving fine-grained cell populations and intrinsic or extrinsic heterogeneity between datasets.

View Article and Find Full Text PDF

Modular organization of enhancer network provides transcriptional robustness in mammalian development.

Nucleic Acids Res

January 2025

State Key Laboratory of Cellular Stress Biology, Xiang'an Hospital, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, No. 4221, Xiang'an South Road, Xiamen, Fujian 361102, China.

Enhancer clusters, pivotal in mammalian development and diseases, can organize as enhancer networks to control cell identity and disease genes; however, the underlying mechanism remains largely unexplored. Here, we introduce eNet 2.0, a comprehensive tool for enhancer networks analysis during development and diseases based on single-cell chromatin accessibility data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!