Typical high-throughput single-cell RNA-sequencing (scRNA-seq) analyses are primarily conducted by (pseudo)alignment, through the lens of annotated gene models, and aimed at detecting differential gene expression. This misses diversity generated by other mechanisms that diversify the transcriptome such as splicing and V(D)J recombination, and is blind to sequences missing from imperfect reference genomes. Here, we present sc-SPLASH, a highly efficient pipeline that extends our SPLASH framework for statistics-first, reference-free discovery to barcoded scRNA-seq (10x Chromium) and spatial transcriptomics (10x Visium); we also provide its optimized module for preprocessing and -mer counting in barcoded data, BKC, as a standalone tool.
View Article and Find Full Text PDFTargeted low-throughput studies have previously identified subcellular RNA localization as necessary for cellular functions including polarization, and translocation. Furthermore, these studies link localization to RNA isoform expression, especially 3' Untranslated Region (UTR) regulation. The recent introduction of genome-wide spatial transcriptomics techniques enables the potential to test if subcellular localization is regulated in situ pervasively.
View Article and Find Full Text PDFWe introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient k-mer counting approach for regulated sequence variation detection in massive datasets from a wide range of sequencing technologies and biological contexts. We demonstrate biological discovery by SPLASH2 in single-cell RNA sequencing (RNA-seq) data and in bulk RNA-seq data from the Cancer Cell Line Encyclopedia, including unannotated alternative splicing in cancer transcriptomes and sensitive detection of circular RNA.
View Article and Find Full Text PDFMost plant genomes and their regulation remain unknown. We used SPLASH - a new, reference-genome free sequence variation detection algorithm - to analyze transcriptional and post-transcriptional regulation from RNA-seq data. We discovered differential homolog expression during maize pollen development, and imbibition-dependent cryptic splicing in Arabidopsis seeds.
View Article and Find Full Text PDFEarly stages of deadly respiratory diseases including COVID-19 are challenging to elucidate in humans. Here, we define cellular tropism and transcriptomic effects of SARS-CoV-2 virus by productively infecting healthy human lung tissue and using scRNA-seq to reconstruct the transcriptional program in "infection pseudotime" for individual lung cell types. SARS-CoV-2 predominantly infected activated interstitial macrophages (IMs), which can accumulate thousands of viral RNA molecules, taking over 60% of the cell transcriptome and forming dense viral RNA bodies while inducing host profibrotic (TGFB1, SPP1) and inflammatory (early interferon response, CCL2/7/8/13, CXCL10, and IL6/10) programs and destroying host cell architecture.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
April 2024
Contingency tables, data represented as counts matrices, are ubiquitous across quantitative research and data-science applications. Existing statistical tests are insufficient however, as none are simultaneously computationally efficient and statistically valid for a finite number of observations. In this work, motivated by a recent application in reference-free genomic inference [K.
View Article and Find Full Text PDFToday's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), which directly analyzes raw sequencing data, using a statistical test to detect a signature of regulation: sample-specific sequence variation. SPLASH detects many types of variation and can be efficiently run at scale.
View Article and Find Full Text PDFContingency tables, data represented as counts matrices, are ubiquitous across quantitative research and data-science applications. Existing statistical tests are insufficient however, as none are simultaneously computationally efficient and statistically valid for a finite number of observations. In this work, motivated by a recent application in reference-free genomic inference (1), we develop OASIS (Optimized Adaptive Statistic for Inferring Structure), a family of statistical tests for contingency tables.
View Article and Find Full Text PDFDiversity-generating and mobile genetic elements are key to microbial and viral evolution and can result in evolutionary leaps. State-of-the-art algorithms to detect these elements have limitations. Here, we introduce DIVE, a new reference-free approach to overcome these limitations using information contained in sequencing reads alone.
View Article and Find Full Text PDFThe authors have withdrawn this manuscript due to a duplicate posting of manuscript number BIORXIV/2022/497555. Therefore, the authors do not wish this work to be cited as reference for the project. If you have any questions, please contact the corresponding author.
View Article and Find Full Text PDFTechnical advances have led to an explosion in the amount of biological data available in recent years, especially in the field of RNA sequencing. Specifically, spatial transcriptomics (ST) datasets, which allow each RNA molecule to be mapped to the 2D location it originated from within a tissue, have become readily available. Due to computational challenges, ST data has rarely been used to study RNA processing such as splicing or differential UTR usage.
View Article and Find Full Text PDFSPLASH is an unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of -mer composition, subsuming many application-specific methods. Here, we introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient -mer counting approach. SPLASH2 enables rapid analysis of massive datasets from a wide range of sequencing technologies and biological contexts, delivering unparalleled scale and speed.
View Article and Find Full Text PDFRNA processing, including splicing and alternative polyadenylation, is crucial to gene function and regulation, but methods to detect RNA processing from single-cell RNA sequencing data are limited by reliance on pre-existing annotations, peak calling heuristics, and collapsing measurements by cell type. We introduce ReadZS, an annotation-free statistical approach to identify regulated RNA processing in single cells. ReadZS discovers cell type-specific RNA processing in human lung and conserved, developmentally regulated RNA processing in mammalian spermatogenesis-including global 3' UTR shortening in human spermatogenesis.
View Article and Find Full Text PDFBackground: Nonfatal strangulation has been identified as a common occurrence in intimate partner violence and can be associated with significant injuries and, at times, increased mortality.
Objective: This article describes a county interagency nonfatal strangulation initiative that efficiently disseminated an educational program for police, emergency medical services, emergency department staff, forensic nursing teams, and prosecuting attorneys, along with a forensic nurse response program. Prior to initiation of this program, no educational programs existed and no forensic examinations were being offered to victims of nonfatal strangulation.
Introduction: Key measures in preventing spread of the virus that causes coronavirus disease 2019 (COVID-19) are social distancing and stay-at-home mandates. These measures along with other stressors have the potential to increase incidences of intimate partner violence (IPV), sexual assault, and child maltreatment.
Methods: We performed a retrospective review of county police dispatches, emergency department (ED) visits, Sexual Assault Nurse Examiner (SANE) consults, Domestic Violence Healthcare Project (DVHP) team consults, and Child Protection Team consults at a large, tertiary, Level I trauma center.
Trimethylguanosine synthase 1 (TGS1) is a highly conserved enzyme that converts the 5'-monomethylguanosine cap of small nuclear RNAs (snRNAs) to a trimethylguanosine cap. Here, we show that loss of TGS1 in Caenorhabditis elegans, Drosophila melanogaster and Danio rerio results in neurological phenotypes similar to those caused by survival motor neuron (SMN) deficiency. Importantly, expression of human TGS1 ameliorates the SMN-dependent neurological phenotypes in both flies and worms, revealing that TGS1 can partly counteract the effects of SMN deficiency.
View Article and Find Full Text PDFToday's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a new unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), an approach that directly analyzes raw sequencing data to detect a signature of regulation: sample-specific sequence variation. The approach, which includes a new statistical test, is computationally efficient and can be run at scale.
View Article and Find Full Text PDFInt J Environ Res Public Health
April 2022
Background: Peer-support programs in medical school can buffer feelings of inadequacy, anxiety, social isolation, and burnout, drawing upon the benefits of near-peer-support resources. This study examined the effects of providing support to students in a medical school peer-support program.
Methods: Using a pre-post, quasi-experimental study design, the investigators surveyed medical students who were peer supporters in their second through fourth years of medical school with four measures assessing (1) empathy, (2) self-efficacy, (3) mental health stigma, and (4) likelihood to assist peers with mental health problems to examine if serving as a volunteer peer supporter had any effect.
Molecular characterization of cell types using single-cell transcriptome sequencing is revolutionizing cell biology and enabling new insights into the physiology of human organs. We created a human reference atlas comprising nearly 500,000 cells from 24 different tissues and organs, many from the same donor. This atlas enabled molecular characterization of more than 400 cell types, their distribution across tissues, and tissue-specific variation in gene expression.
View Article and Find Full Text PDFDehydration of the upper airways increases risks of respiratory diseases from COVID-19 to asthma and COPD. We find in human volunteer studies involving 464 human subjects in Germany, the US, and India that respiratory droplet generation increases by up to 4 orders of magnitude in dehydration-associated states of advanced age (n = 357), elevated BMI-age (n = 148), strenuous exercise (n = 20) and SARS-CoV-2 infection (n = 87), and falls with hydration of the nose, larynx and trachea by calcium-rich hypertonic salts. We also find in a protocol of exercise-induced airway dehydration that hydration of the airways by calcium-rich salts increases oxygenation relative to a non-treatment control (P < 0.
View Article and Find Full Text PDFDetecting single-cell-regulated splicing from droplet-based technologies is challenging. Here, we introduce the splicing Z score (SpliZ), an annotation-free statistical method to detect regulated splicing in single-cell RNA sequencing. We applied the SpliZ to human lung cells, discovering hundreds of genes with cell-type-specific splicing patterns including ones with potential implications for basic and translational biology.
View Article and Find Full Text PDFThe extent splicing is regulated at single-cell resolution has remained controversial due to both available data and methods to interpret it. We apply the SpliZ, a new statistical approach, to detect cell-type-specific splicing in >110K cells from 12 human tissues. Using 10X Chromium data for discovery, 9.
View Article and Find Full Text PDF