Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing.

bioRxiv

Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA.

Published: January 2023

Recently, a new modification has been proposed by Hjörleifsson and Sullivan . to the model used to classify the splicing status of reads (as spliced (mature), unspliced (nascent), or ambiguous) in single-cell and single-nucleus RNA-seq data. Here, we evaluate both the theoretical basis and practical implementation of the proposed method. The proposed method is highly-conservative, and therefore, unlikely to mischaracterize reads as spliced (mature) or unspliced (nascent) when they are not. However, we find that it leaves a large fraction of reads classified as ambiguous, and, in practice, allocates these ambiguous reads in an all-or-nothing manner, and differently between single-cell and single-nucleus RNA-seq data. Further, as implemented in practice, the ambiguous classification is implicit and based on the index against which the reads are mapped, which leads to several drawbacks compared to methods that consider both spliced (mature) and unspliced (nascent) mapping targets simultaneously - for example, the ability to use confidently assigned reads to rescue ambiguous reads based on shared UMIs and gene targets. Nonetheless, we show that these conservative assignment rules can be obtained directly in existing approaches simply by altering the set of targets that are indexed. To this end, we introduce the reference and show that its use with alevin-fry recapitulates the more conservative proposed classification. We also observe that, on experimental data, and under the proposed allocation rules for ambiguous UMIs, the difference between the proposed classification scheme and existing conventions appears much smaller than previously reported. We demonstrate the use of the new piscem index for mapping simultaneously against spliced (mature) and unspliced (nascent) targets, allowing classification against the full nascent and mature transcriptome in human or mouse in <3GB of memory. Finally, we discuss the potential of incorporating probabilistic evidence into the inference of splicing status, and suggest that it may provide benefits beyond what can be obtained from discrete classification of UMIs as splicing-ambiguous.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881993PMC
http://dx.doi.org/10.1101/2023.01.04.522742DOI Listing

Publication Analysis

Top Keywords

spliced mature
16
mature unspliced
16
unspliced nascent
16
single-cell single-nucleus
12
reads spliced
8
single-nucleus rna-seq
8
rna-seq data
8
proposed method
8
ambiguous reads
8
proposed classification
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!