Spatially mapping the transcriptome and proteome in the same tissue section can significantly advance our understanding of heterogeneous cellular processes and connect cell type to function. Here, we present Deterministic Barcoding in Tissue sequencing plus (DBiTplus), an integrative multi-modality spatial omics approach that combines sequencing-based spatial transcriptomics and image-based spatial protein profiling on the same tissue section to enable both single-cell resolution cell typing and genome-scale interrogation of biological pathways. DBiTplus begins with reverse transcription for cDNA synthesis, microfluidic delivery of DNA oligos for spatial barcoding, retrieval of barcoded cDNA using RNaseH, an enzyme that selectively degrades RNA in an RNA-DNA hybrid, preserving the intact tissue section for high-plex protein imaging with CODEX.
View Article and Find Full Text PDFSpatially mapping the transcriptome and proteome in the same tissue section can significantly advance our understanding of heterogeneous cellular processes and connect cell type to function. Here, we present Deterministic Barcoding in Tissue sequencing plus (DBiTplus), an integrative multi-modality spatial omics approach that combines sequencing-based spatial transcriptomics and image-based spatial protein profiling on the same tissue section to enable both single-cell resolution cell typing and genome-scale interrogation of biological pathways. DBiTplus begins with reverse transcription for cDNA synthesis, microfluidic delivery of DNA oligos for spatial barcoding, retrieval of barcoded cDNA using RNaseH, an enzyme that selectively degrades RNA in an RNA-DNA hybrid, preserving the intact tissue section for high-plex protein imaging with CODEX.
View Article and Find Full Text PDFData integration to align cells across batches has become a cornerstone of single-cell data analysis, critically affecting downstream results. Currently, there are no guidelines for when the biological differences between samples are separable from batch effects. Here we show that current paradigms for single-cell data integration remove biologically meaningful variation and introduce distortion.
View Article and Find Full Text PDFRefractoriness to initial chemotherapy and relapse after remission are the main obstacles to curing T cell acute lymphoblastic leukemia (T-ALL). While tumor heterogeneity has been implicated in treatment failure, the cellular and genetic factors contributing to resistance and relapse remain unknown. Here we linked tumor subpopulations with clinical outcome, created an atlas of healthy pediatric hematopoiesis and applied single-cell multiomic analysis to a diverse cohort of 40 T-ALL cases.
View Article and Find Full Text PDFPersistent inflammation driven by cytokines such as type-one interferon (IFN-I) can cause immunosuppression. We show that administration of the Janus kinase 1 (JAK1) inhibitor itacitinib after anti-PD-1 (programmed cell death protein 1) immunotherapy improves immune function and antitumor responses in mice and results in high response rates (67%) in a phase 2 clinical trial for metastatic non-small cell lung cancer. Patients who failed to respond to initial anti-PD-1 immunotherapy but responded after addition of itacitinib had multiple features of poor immune function to anti-PD-1 alone that improved after JAK inhibition.
View Article and Find Full Text PDFDetecting structural variants (SVs) in whole-genome sequencing poses significant challenges. We present a protocol for variant calling, merging, genotyping, sensitivity analysis, and laboratory validation for generating a high-quality SV call set in whole-genome sequencing from the Alzheimer's Disease Sequencing Project comprising 578 individuals from 111 families. Employing two complementary pipelines, Scalpel and Parliament, for SV/indel calling, we assessed sensitivity through sample replicates (N = 9) with in silico variant spike-ins.
View Article and Find Full Text PDFAlthough single-cell and spatial sequencing methods enable simultaneous measurement of more than one biological modality, no technology can capture all modalities within the same cell. For current data integration methods, the feasibility of cross-modal integration relies on the existence of highly correlated, a priori 'linked' features. We describe matching X-modality via fuzzy smoothed embedding (MaxFuse), a cross-modal data integration method that, through iterative coembedding, data smoothing and cell matching, uses all information in each modality to obtain high-quality integration even when features are weakly linked.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
August 2023
Multimodal single-cell technologies profile multiple modalities for each cell simultaneously, enabling a more thorough characterization of cell populations. Existing dimension-reduction methods for multimodal data capture the "union of information," producing a lower-dimensional embedding that combines the information across modalities. While these tools are useful, we focus on a fundamentally different task of separating and quantifying the information among cells that is shared between the two modalities as well as unique to only one modality.
View Article and Find Full Text PDFThe intestine is a complex organ that promotes digestion, extracts nutrients, participates in immune surveillance, maintains critical symbiotic relationships with microbiota and affects overall health. The intesting has a length of over nine metres, along which there are differences in structure and function. The localization of individual cell types, cell type development trajectories and detailed cell transcriptional programs probably drive these differences in function.
View Article and Find Full Text PDFData integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the biological differences between samples are separable from batch effects, and thus, data integration usually involve a lot of guesswork: Cells across batches should be aligned to be "appropriately" mixed, while preserving "main cell type clusters". We show evidence that current paradigms for single cell data integration are unnecessarily aggressive, removing biologically meaningful variation.
View Article and Find Full Text PDFLong-read sequencing has become a powerful tool for alternative splicing analysis. However, technical and computational challenges have limited our ability to explore alternative splicing at single cell and spatial resolution. The higher sequencing error of long reads, especially high indel rates, have limited the accuracy of cell barcode and unique molecular identifier (UMI) recovery.
View Article and Find Full Text PDFBarrett's esophagus is a common type of metaplasia and a precursor of esophageal adenocarcinoma. However, the cell states and lineage connections underlying the origin, maintenance, and progression of Barrett's esophagus have not been resolved in humans. To address this, we performed single-cell lineage tracing and transcriptional profiling of patient cells isolated from metaplastic and healthy tissue.
View Article and Find Full Text PDFsingle-cell sequencing methods have enabled the profiling of multiple types of molecular readouts at cellular resolution, and recent developments in spatial barcoding, in situ hybridization, and in situ sequencing allow such molecular readouts to retain their spatial context. Since no technology can provide complete characterization across all layers of biological modalities within the same cell, there is pervasive need for computational cross-modal integration (also called diagonal integration) of single-cell and spatial omics data. For current methods, the feasibility of cross-modal integration relies on the existence of highly correlated, a priori "linked" features.
View Article and Find Full Text PDFPathway analysis is a key analytical stage in the interpretation of omics data, providing a powerful method for detecting alterations in cellular processes. We recently developed a sensitive and distribution-free statistical framework for multisample distribution testing, which we implement here in the open-source R package single-cell pathway analysis (SCPA). We demonstrate the effectiveness of SCPA over commonly used methods, generate a scRNA-seq T cell dataset, and characterize pathway activity over early cellular activation.
View Article and Find Full Text PDFPurpose: The liver is the most frequent metastatic site for colorectal cancer. Its microenvironment is modified to provide a niche that is conducive for colorectal cancer cell growth. This study focused on characterizing the cellular changes in the metastatic colorectal cancer (mCRC) liver tumor microenvironment (TME).
View Article and Find Full Text PDFThe epigenetic control of gene expression is highly cell-type and context specific. Yet, despite its complexity, gene regulatory logic can be broken down into modular components consisting of a transcription factor (TF) activating or repressing the target gene expression through its binding to a cis-regulatory region. We propose a nonparametric approach, TRIPOD, to detect and characterize the three-way relationships between a TF, its target gene, and the accessibility of the TF's binding site using single-cell RNA and ATAC multiomic data.
View Article and Find Full Text PDFInterferon-gamma (IFN-γ) has pleiotropic effects on cancer immune checkpoint blockade (ICB), including roles in ICB resistance. We analyzed gene expression in ICB-sensitive versus ICB-resistant tumor cells and identified a strong association between interferon-mediated resistance and expression of Ripk1, a regulator of tumor necrosis factor (TNF) superfamily receptors. Genetic interaction screening revealed that in cancer cells, RIPK1 diverted TNF signaling through NF-κB and away from its role in cell death.
View Article and Find Full Text PDFSingle cell biology has the potential to elucidate many critical biological processes and diseases, from development and regeneration to cancer. Single cell analyses are uncovering the molecular diversity of cells, revealing a clearer picture of the variation among and between different cell types. New techniques are beginning to unravel how differences in cell state-transcriptional, epigenetic, and other characteristics-can lead to different cell fates among genetically identical cells, which underlies complex processes such as embryonic development, drug resistance, response to injury, and cellular reprogramming.
View Article and Find Full Text PDFOver a decade of genome-wide association studies (GWAS) have led to the finding of extreme polygenicity of complex traits. The phenomenon that "all genes affect every complex trait" complicates Mendelian Randomization (MR) studies, where natural genetic variations are used as instruments to infer the causal effect of heritable risk factors. We reexamine the assumptions of existing MR methods and show how they need to be clarified to allow for pervasive horizontal pleiotropy and heterogeneous effect sizes.
View Article and Find Full Text PDFCancer progression is driven by both somatic copy number aberrations (CNAs) and chromatin remodeling, yet little is known about the interplay between these two classes of events in shaping the clonal diversity of cancers. We present Alleloscope, a method for allele-specific copy number estimation that can be applied to single-cell DNA- and/or transposase-accessible chromatin-sequencing (scDNA-seq, ATAC-seq) data, enabling combined analysis of allele-specific copy number and chromatin accessibility. On scDNA-seq data from gastric, colorectal and breast cancer samples, with validation using matched linked-read sequencing, Alleloscope finds pervasive occurrence of highly complex, multiallelic CNAs, in which cells that carry varying allelic configurations adding to the same total copy number coevolve within a tumor.
View Article and Find Full Text PDFRecent genetic data can offer important insights into the roles of lipoprotein subfractions and particle sizes in preventing coronary artery disease (CAD), as previous observational studies have often reported conflicting results. We used the LD score regression to estimate the genetic correlation of 77 subfraction traits with traditional lipid profile and identified 27 traits that may represent distinct genetic mechanisms. We then used Mendelian randomization (MR) to estimate the causal effect of these traits on the risk of CAD.
View Article and Find Full Text PDFWhile single cell RNA sequencing (scRNA-seq) is invaluable for studying cell populations, cell-surface proteins are often integral markers of cellular function and serve as primary targets for therapeutic intervention. Here we propose a transfer learning framework, single cell Transcriptome to Protein prediction with deep neural network (cTP-net), to impute surface protein abundances from scRNA-seq data by learning from existing single-cell multi-omic resources.
View Article and Find Full Text PDFAlthough scRNA-seq is now ubiquitously adopted in studies of intratumor heterogeneity, detection of somatic mutations and inference of clonal membership from scRNA-seq is currently unreliable. We propose DENDRO, an analysis method for scRNA-seq data that clusters single cells into genetically distinct subclones and reconstructs the phylogenetic tree relating the subclones. DENDRO utilizes transcribed point mutations and accounts for technical noise and expression stochasticity.
View Article and Find Full Text PDFThe functional properties of circulating CD8 T cells have been associated with immune control of HIV. However, viral replication occurs predominantly in secondary lymphoid tissues, such as lymph nodes (LNs). We used an integrated single-cell approach to characterize effective HIV-specific CD8 T cell responses in the LNs of elite controllers (ECs), defined as individuals who suppress viral replication in the absence of antiretroviral therapy (ART).
View Article and Find Full Text PDFIn data science, determining proximity between observations is critical to many downstream analyses such as clustering, classification, and prediction. However, when the data's underlying probability distribution is unclear, the function used to compute similarity between data points is often arbitrarily chosen. Here, we present a novel definition of proximity, Semblance, that uses the empirical distribution of a feature to inform the pair-wise similarity between observations.
View Article and Find Full Text PDF