We consider the problem of clustering grouped data with possibly non-exchangeable groups whose dependencies can be characterized by a known directed acyclic graph. To allow the sharing of clusters among the non-exchangeable groups, we propose a Bayesian nonparametric approach, termed graphical Dirichlet process, that jointly models the dependent group-specific random measures by assuming each random measure to be distributed as a Dirichlet process whose concentration parameter and base probability measure depend on those of its parent groups. The resulting joint stochastic process respects the Markov property of the directed acyclic graph that links the groups.
View Article and Find Full Text PDFIn a traditional Gaussian graphical model, data homogeneity is routinely assumed with no extra variables affecting the conditional independence. In modern genomic datasets, there is an abundance of auxiliary information, which often gets under-utilized in determining the joint dependency structure. In this article, we consider a Bayesian approach to model undirected graphs underlying heterogeneous multivariate observations with additional assistance from covariates.
View Article and Find Full Text PDFRecent technologies such as spatial transcriptomics, enable the measurement of gene expressions at the single-cell level along with the spatial locations of these cells in the tissue. Spatial clustering of the cells provides valuable insights into the understanding of the functional organization of the tissue. However, most such clustering methods involve some dimension reduction that leads to a loss of the inherent dependency structure among genes at any spatial location in the tissue.
View Article and Find Full Text PDFThe COVID-19 pandemic has profoundly reshaped human life. The development of COVID-19 vaccines has offered a semblance of normalcy. However, obstacles to vaccination have led to substantial loss of life and economic burdens.
View Article and Find Full Text PDFEpidemiological studies have shown that circadian rhythm disruption (CRD) is associated with the risk of breast cancer. However, the role of CRD in mammary gland morphology and aggressive basal mammary tumorigenesis and the molecular mechanisms underlying CRD and cancer risk remain unknown. To investigate the effect of CRD on aggressive tumorigenesis, a genetically engineered mouse model that recapitulates the human basal type of breast cancer was used for this study.
View Article and Find Full Text PDFSurvival models are used to analyze time-to-event data in a variety of disciplines. Proportional hazard models provide interpretable parameter estimates, but proportional hazard assumptions are not always appropriate. Non-parametric models are more flexible but often lack a clear inferential framework.
View Article and Find Full Text PDFBacteriophages are the natural predators of bacteria and are available abundantly everywhere in nature. Lytic phages can specifically infect their bacterial host (through attachment to the receptor) and use their host replication machinery to replicate rapidly, a feature that enables them to kill a disease-causing bacteria. Hence, phage attachment to the host bacteria is the first important step of the infection process.
View Article and Find Full Text PDFBackground: Salmonella enterica serotype Typhi is one of the major pathogens causing typhoid fever and a public health burden worldwide. Recently, the increasing number of multidrug-resistant strains of Salmonella spp. has made this utmost necessary to consider bacteriophages as a potential alternative to antibiotics for S.
View Article and Find Full Text PDFThe clustering of proteins is of interest in cancer cell biology. This article proposes a hierarchical Bayesian model for protein (variable) clustering hinging on correlation structure. Starting from a multivariate normal likelihood, we enforce the clustering through prior modeling using angle-based unconstrained reparameterization of correlations and assume a truncated Poisson distribution (to penalize a large number of clusters) as prior on the number of clusters.
View Article and Find Full Text PDFis one of the common causal agents of bacterial gastroenteritis-related morbidity and mortality among children below 5 years and the elderly populations. Salmonellosis in humans is caused mainly by consuming contaminated food originating from animals. The genus has several serovars, and many of them are recently reported to be resistant to multiple drugs.
View Article and Find Full Text PDFBackground: The endogenous circadian clock, which controls daily rhythms in the expression of at least half of the mammalian genome, has a major influence on cell physiology. Consequently, disruption of the circadian system is associated with wide range of diseases including cancer. While several circadian clock genes have been associated with cancer progression, little is known about the survival when two or more platforms are considered together.
View Article and Find Full Text PDFEstimating the marginal and joint densities of the long-term average intakes of different dietary components is an important problem in nutritional epidemiology. Since these variables cannot be directly measured, data are usually collected in the form of 24-hour recalls of the intakes, which show marked patterns of conditional heteroscedasticity. Significantly compounding the challenges, the recalls for episodically consumed dietary components also include exact zeros.
View Article and Find Full Text PDFShigellosis, caused by Shigella bacterial spp., is one of the leading causes of diarrheal morbidity and mortality. An increasing prevalence of multidrug-resistant Shigella species has revived the importance of bacteriophages as an alternative therapy to antibiotics.
View Article and Find Full Text PDFBernoulli (Andover)
February 2021
Gaussian graphical models are a popular tool to learn the dependence structure in the form of a graph among variables of interest. Bayesian methods have gained in popularity in the last two decades due to their ability to simultaneously learn the covariance and the graph. There is a wide variety of model-based methods to learn the underlying graph assuming various forms of the graphical structure.
View Article and Find Full Text PDFMeasuring usual dietary intake in freely living humans is difficult to accomplish. As a part of our recent study, a food frequency questionnaire was completed by healthy adult men and women at days 0 and 90 of the study. Data from the food questionnaire were analyzed with a nutrient analysis program ( www.
View Article and Find Full Text PDFJ R Stat Soc Ser C Appl Stat
November 2019
We consider the problem where the data consist of a survival time and a binary outcome measurement for each individual, as well as corresponding predictors. The goal is to select the common set of predictors which affect both the responses, and not just only one of them. In addition, we develop a survival prediction model based on data integration.
View Article and Find Full Text PDFWe develop a Bayesian methodology aimed at simultaneously estimating low-rank and row-sparse matrices in a high-dimensional multiple-response linear regression model. We consider a carefully devised shrinkage prior on the matrix of regression coefficients which obviates the need to specify a prior on the rank, and shrinks the regression matrix towards low-rank and row-sparse structures. We provide theoretical support to the proposed methodology by proving minimax optimality of the posterior mean under the prediction risk in ultra-high dimensional settings where the number of predictors can grow sub-exponentially relative to the sample size.
View Article and Find Full Text PDFRecent developments in high-throughput methods have resulted in the collection of high-dimensional data types from multiple sources and technologies that measure distinct yet complementary information. Integrated clustering of such multiple data types or multi-view clustering is critical for revealing pathological insights. However, multi-view clustering is challenging due to the complex dependence structure between multiple data types, including directional dependency.
View Article and Find Full Text PDFCurrently, novel coronavirus disease 2019 (COVID-19) is a big threat to global health. The rapid spread of the virus has created pandemic, and countries all over the world are struggling with a surge in COVID-19 infected cases. There are no drugs or other therapeutics approved by the US Food and Drug Administration to prevent or treat COVID-19: information on the disease is very limited and scattered even if it exists.
View Article and Find Full Text PDFMotivation: It is well known that the integration among different data-sources is reliable because of its potential of unveiling new functionalities of the genomic expressions, which might be dormant in a single-source analysis. Moreover, different studies have justified the more powerful analyses of multi-platform data. Toward this, in this study, we consider the circadian genes' omics profile, such as copy number changes and RNA-sequence data along with their survival response.
View Article and Find Full Text PDFGraphical models are ubiquitous tools to describe the interdependence between variables measured simultaneously such as large-scale gene or protein expression data. Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices and they are generated under a multivariate normal joint distribution. However, they suffer from several shortcomings since they are based on Gaussian distribution assumptions.
View Article and Find Full Text PDFLong non-coding RNAs (lncRNAs) are a large and diverse class of transcribed RNAs, which have been shown to play a significant role in developing cancer. In this study, we apply integrative modeling framework to integrate the DNA copy number variation (CNV), lncRNA expression, and downstream target protein expression to predict patient survival in breast cancer. We develop a 3-stage model combining a mechanical model (lncRNA regressed on CNV and target proteins regressed on lncRNA) and a clinical model (survival regressed on estimated effects from the mechanical models).
View Article and Find Full Text PDFAccurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of patients. There are considerable statistical and computational challenges due to the large dimension of the problems.
View Article and Find Full Text PDFThere has been an intense development in the Bayesian graphical model literature over the past decade; however, most of the existing methods are restricted to moderate dimensions. We propose a novel graphical model selection approach for large dimensional settings where the dimension increases with the sample size, by decoupling model fitting and covariance selection. First, a full model based on a complete graph is fit under a novel class of mixtures of inverse-Wishart priors, which induce shrinkage on the precision matrix under an equivalence with Cholesky-based regularization, while enabling conjugate updates.
View Article and Find Full Text PDFWe develop a Bayes factor based testing procedure for comparing two population means in high dimensional settings. In 'large-p-small-n' settings, Bayes factors based on proper priors require eliciting a large and complex × covariance matrix, whereas Bayes factors based on Jeffrey's prior suffer the same impediment as the classical Hotelling test statistic as they involve inversion of ill-formed sample covariance matrices. To circumvent this limitation, we propose that the Bayes factor be based on lower dimensional random projections of the high dimensional data vectors.
View Article and Find Full Text PDF