One of the fundamental computational problems in cancer genomics is the identification of single nucleotide variants (SNVs) from DNA sequencing data. Many statistical models and software implementations for SNV calling have been developed in the literature, yet, they still disagree widely on real datasets. Based on an empirical Bayesian approach, we introduce a local false discovery rate (LFDR) estimator for germline SNV calling.
View Article and Find Full Text PDFMotivation: The rapid single-cell transcriptomic technology developments have led to an increasing interest in cellular heterogeneity within cell populations. Although cell-type proportions can be obtained directly from single-cell RNA sequencing (scRNA-seq), it is costly and not feasible in every study. Alternatively, with fewer experimental complications, cell-type compositions are characterized from bulk RNA-seq data.
View Article and Find Full Text PDFBackground: Treating cancer depends in part on identifying the mutations driving each patient's disease. Many clinical laboratories are adopting high-throughput sequencing for assaying patients' tumours, applying targeted panels to formalin-fixed paraffin-embedded tumour tissues to detect clinically-relevant mutations. While there have been some benchmarking and best practices studies of this scenario, much variant calling work focuses on whole-genome or whole-exome studies, with fresh or fresh-frozen tissue.
View Article and Find Full Text PDFIn this paper, we assume that allele frequencies are random variables and follow certain statistical distributions. However, specifying an appropriate informative prior distribution with specific hyperparameters seems to be a major issue. Assuming that prior information varies over some classes of priors, we develop the concept of robust Bayes estimation into the context of allele frequency estimation.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
March 2021
In a genome-wide association study (GWAS), the probability that a single nucleotide polymorphism (SNP) is not associated with a disease is its local false discovery rate (LFDR). The LFDR for each SNP is relative to a reference class of SNPs. For example, the LFDR of an exonic SNP can vary widely depending on whether it is considered relative to the separate reference class of other exonic SNPs or relative to the combined reference class of all SNPs in the data set.
View Article and Find Full Text PDFWe argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from = 0.05 to = 0.005, is deleterious for the finding of new discoveries and the progress of science.
View Article and Find Full Text PDFThe maximum entropy (ME) method is a recently-developed approach for estimating local false discovery rates (LFDR) that incorporates external information allowing assignment of a subset of tests to a category with a different prior probability of following the null hypothesis. Using this ME method, we have reanalyzed the findings from a recent large genome-wide association study of coronary artery disease (CAD), incorporating biologic annotations. Our revised LFDR estimates show many large reductions in LFDR, particularly among the genetic variants belonging to annotation categories that were known to be of particular interest for CAD.
View Article and Find Full Text PDF