A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.

Download full-text PDF

Source
http://dx.doi.org/10.1515/ijb-2020-0039DOI Listing

Publication Analysis

Top Keywords

zero-inflated non-negative
8
non-negative matrix
8
matrix factorization
8
mixed signals
8
signals biological
8
biological data
8
pure samples
8
data
5
factorization deconvolution
4
deconvolution mixed
4

Similar Publications

Background: Two characteristics of commonly used outcomes in medical research are zero inflation and non-negative integers; examples include the number of hospital admissions or emergency department visits, where the majority of patients will have zero counts. Zero-inflated regression models were devised to analyze this type of data. However, the performance of zero-inflated regression models or the properties of data best suited for these analyses have not been thoroughly investigated.

View Article and Find Full Text PDF

Nonparametric scanning tests of homogeneity for hierarchical models with continuous covariates.

Biometrics

September 2023

Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, USA.

In many applications of hierarchical models, there is often interest in evaluating the inherent heterogeneity in view of observed data. When the underlying hypothesis involves parameters resting on the boundary of their support space such as variances and mixture proportions, it is a usual practice to entertain testing procedures that rely on common heterogeneity assumptions. Such procedures, albeit omnibus for general alternatives, may entail a substantial loss of power for specific alternatives such as heterogeneity varying with covariates.

View Article and Find Full Text PDF

A comparison of statistical methods for modeling count data with an application to hospital length of stay.

BMC Med Res Methodol

August 2022

School of Mathematical and Statistical Sciences, University of Texas Rio Grande Valley, One West University Boulevard, Brownsville CampusBrownsville, TX, 78520, USA.

Background: Hospital length of stay (LOS) is a key indicator of hospital care management efficiency, cost of care, and hospital planning. Hospital LOS is often used as a measure of a post-medical procedure outcome, as a guide to the benefit of a treatment of interest, or as an important risk factor for adverse events. Therefore, understanding hospital LOS variability is always an important healthcare focus.

View Article and Find Full Text PDF

Dependent variables in health psychology are often counts, for example, of a behaviour or number of engagements with an intervention. These counts can be very strongly skewed, and/or contain large numbers of zeros as well as extreme outliers. For example, 'How many cigarettes do you smoke on an average day?' The modal answer may be zero but may range from 0 to 40+.

View Article and Find Full Text PDF

Survival models with a frailty term are presented as an extension of Cox's proportional hazard model, in which a random effect is introduced in the hazard function in a multiplicative form with the aim of modeling the unobserved heterogeneity in the population. Candidates for the frailty distribution are assumed to be continuous and non-negative. However, this assumption may not be true in some situations.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!