SPLASH is an unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of -mer composition, subsuming many application-specific methods. Here, we introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient -mer counting approach. SPLASH2 enables rapid analysis of massive datasets from a wide range of sequencing technologies and biological contexts, delivering unparalleled scale and speed. The SPLASH2 algorithm unveils new biology (without tuning) in single-cell RNA-sequencing data from human muscle cells, as well as bulk RNA-seq from the entire Cancer Cell Line Encyclopedia (CCLE), including substantial unannotated alternative splicing in cancer transcriptome. The same untuned SPLASH2 algorithm recovers the BCR-ABL gene fusion, and detects circRNA sensitively and specifically, underscoring SPLASH2's unmatched precision and scalability across diverse RNA-seq detection tasks.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055302 | PMC |
http://dx.doi.org/10.1101/2023.03.17.533189 | DOI Listing |
Nat Biotechnol
September 2024
Department of Algorithmics and Software, Silesian University of Technology, Gliwice, Poland.
We introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient k-mer counting approach for regulated sequence variation detection in massive datasets from a wide range of sequencing technologies and biological contexts. We demonstrate biological discovery by SPLASH2 in single-cell RNA sequencing (RNA-seq) data and in bulk RNA-seq data from the Cancer Cell Line Encyclopedia, including unannotated alternative splicing in cancer transcriptomes and sensitive detection of circular RNA.
View Article and Find Full Text PDFbioRxiv
March 2024
Department of Algorithmics and Software, Silesian University of Technology, Gliwice, Poland.
SPLASH is an unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of -mer composition, subsuming many application-specific methods. Here, we introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient -mer counting approach. SPLASH2 enables rapid analysis of massive datasets from a wide range of sequencing technologies and biological contexts, delivering unparalleled scale and speed.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!