Motivation: Quantification estimates of gene expression from single-cell RNA-seq (scRNA-seq) data have inherent uncertainty due to reads that map to multiple genes. Many existing scRNA-seq quantification pipelines ignore multi-mapping reads and therefore underestimate expected read counts for many genes. alevin accounts for multi-mapping reads and allows for the generation of 'inferential replicates', which reflect quantification uncertainty. Previous methods have shown improved performance when incorporating these replicates into statistical analyses, but storage and use of these replicates increases computation time and memory requirements.

Results: We demonstrate that storing only the mean and variance from a set of inferential replicates ('compression') is sufficient to capture gene-level quantification uncertainty, while reducing disk storage to as low as 9% of original storage, and memory usage when loading data to as low as 6%. Using these values, we generate 'pseudo-inferential' replicates from a negative binomial distribution and propose a general procedure for incorporating these replicates into a proposed statistical testing framework. When applying this procedure to trajectory-based differential expression analyses, we show false positives are reduced by more than a third for genes with high levels of quantification uncertainty. We additionally extend the Swish method to incorporate pseudo-inferential replicates and demonstrate improvements in computation time and memory usage without any loss in performance. Lastly, we show that discarding multi-mapping reads can result in significant underestimation of counts for functionally important genes in a real dataset.

Availability And Implementation: makeInfReps and splitSwish are implemented in the R/Bioconductor fishpond package available at https://bioconductor.org/packages/fishpond. Analyses and simulated datasets can be found in the paper's GitHub repo at https://github.com/skvanburen/scUncertaintyPaperCode.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8289386PMC
http://dx.doi.org/10.1093/bioinformatics/btab001DOI Listing

Publication Analysis

Top Keywords

quantification uncertainty
16
multi-mapping reads
12
incorporating replicates
8
computation time
8
time memory
8
memory usage
8
replicates
6
uncertainty
5
quantification
5
compression quantification
4

Similar Publications

Autonomous driving systems (ADS), leveraging advancements in learning algorithms, have the potential to significantly enhance traffic safety by reducing human errors. However, a major challenge in evaluating ADS safety is quantifying the performance uncertainties inherent in these black box algorithms, especially in dynamic and complex service environments. Addressing this challenge is crucial for maintaining public trust and promoting widespread ADS adoption.

View Article and Find Full Text PDF

Leachables leached from a medical device during its clinical use are important due to the patient health-related effects they may have. Thus, medical devices are profiled for leachables (and/or extractables as probable leachables) by screening extracts or leachates of the medical device for released organic substances via non-targeted analysis (NTA) employing chromatographic methods coupled with mass spectrometric detection. Chromatographic mass spectral response factors for extractables and leachables vary significantly from compound to compound, complicating the application of assessment strategies such as the Analytical Evaluation Threshold (AET), which is the concentration threshold at or above which an extractable or leachable must be reported for quantitative toxicological risk assessment.

View Article and Find Full Text PDF

Determination of 16 Hydroxyanthracene Derivatives in Food Supplements Using LC-MS/MS: Method Development and Application.

Toxins (Basel)

November 2024

Toxins, Organic Contaminants and Additives, Physical and Chemical Health Risks, Sciensano, Leuvensesteenweg 17, 3080 Tervuren, Belgium.

Hydroxyanthracene derivatives (HADs) are plant substances produced by a variety of plant species, including different , , and species and These plants are often used in food supplements to improve bowel function. However, recently, the European Commission prohibited a number of HADs due to toxicological concerns. These HADs included aloin (aloin A and aloin B), aloe-emodin, emodin, and danthron.

View Article and Find Full Text PDF

Digital-Tier Strategy Improves Newborn Screening for Glutaric Aciduria Type 1.

Int J Neonatal Screen

December 2024

Engineering Mathematics and Computing Lab (EMCL), Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, 69120 Heidelberg, Germany.

Glutaric aciduria type 1 (GA1) is a rare inherited metabolic disease increasingly included in newborn screening (NBS) programs worldwide. Because of the broad biochemical spectrum of individuals with GA1 and the lack of reliable second-tier strategies, NBS for GA1 is still confronted with a high rate of false positives. In this study, we aim to increase the specificity of NBS for GA1 and, hence, to reduce the rate of false positives through machine learning methods.

View Article and Find Full Text PDF

Recent advances in our understanding of methanogenesis have led to the development of antimethanogenic feed additives (AMFA) that can reduce enteric methane (CH) emissions to varying extents, via direct targeting of methanogens, alternative electron acceptors, or altering the rumen environment. Here we examine current and new approaches used for the accounting (i.e.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!