The huge body of publicly available RNA-sequencing (RNA-seq) libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resources and processing time, which does not scale easily to large datasets. -mer decomposition constitutes a new way to process RNA-seq data for the identification of transcriptional signatures, as -mers can be used to quantify accurately gene expression in a less resource-consuming way. We present the Kmerator Suite, a set of three tools designed to extract specific -mer signatures, quantify these -mers into RNA-seq datasets and quickly visualize large dataset characteristics. The core tool, Kmerator, produces specific -mers for 97% of human genes, enabling the measure of gene expression with high accuracy in simulated datasets. KmerExploR, a direct application of Kmerator, uses a set of predictor gene-specific -mers to infer metadata including library protocol, sample features or contaminations from RNA-seq datasets. KmerExploR results are visualized through a user-friendly interface. Moreover, we demonstrate that the Kmerator Suite can be used for advanced queries targeting known or new biomarkers such as mutations, gene fusions or long non-coding RNAs for human health applications.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8221386PMC
http://dx.doi.org/10.1093/nargab/lqab058DOI Listing

Publication Analysis

Top Keywords

kmerator suite
12
rna-seq datasets
12
specific -mer
8
-mer signatures
8
gene expression
8
datasets kmerexplor
8
kmerator
5
rna-seq
5
datasets
5
suite design
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!