This paper advocates proximal Markov Chain Monte Carlo (ProxMCMC) as a flexible and general Bayesian inference framework for constrained or regularized estimation. Originally introduced in the Bayesian imaging literature, ProxMCMC employs the Moreau-Yosida envelope for a smooth approximation of the total-variation regularization term, fixes variance and regularization strength parameters as constants, and uses the Langevin algorithm for the posterior sampling. We extend ProxMCMC to be fully Bayesian by providing data-adaptive estimation of all parameters including the regularization strength parameter.
View Article and Find Full Text PDFThe growing prevalence of tensor data, or multiway arrays, in science and engineering applications motivates the need for tensor decompositions that are robust against outliers. In this paper, we present a robust Tucker decomposition estimator based on the criterion, called the Tucker-. Our numerical experiments demonstrate that Tucker- has empirically stronger recovery performance in more challenging high-rank scenarios compared with existing alternatives.
View Article and Find Full Text PDFProximal Markov Chain Monte Carlo is a novel construct that lies at the intersection of Bayesian computation and convex optimization, which helped popularize the use of nondifferentiable priors in Bayesian statistics. Existing formulations of proximal MCMC, however, require hyperparameters and regularization parameters to be prespecified. In this work, we extend the paradigm of proximal MCMC through introducing a novel new class of nondifferentiable priors called epigraph priors.
View Article and Find Full Text PDFBuilding on previous research of Chi and Chi (2022), the current paper revisits estimation in robust structured regression under the LE criterion. We adopt the majorization-minimization (MM) principle to design a new algorithm for updating the vector of regression coefficients. Our sharp majorization achieves faster convergence than the previous alternating proximal gradient descent algorithm (Chi and Chi, 2022).
View Article and Find Full Text PDFJ Comput Graph Stat
March 2022
We introduce a user-friendly computational framework for implementing robust versions of a wide variety of structured regression methods with the L criterion. In addition to introducing an algorithm for performing LE regression, our framework enables robust regression with the L criterion for additional structural constraints, works without requiring complex tuning procedures on the precision parameter, can be used to identify heterogeneous subpopulations, and can incorporate readily available non-robust structured regression solvers. We provide convergence guarantees for the framework and demonstrate its flexibility with some examples.
View Article and Find Full Text PDFModern technologies produce a deluge of complicated data. In neuroscience, for example, minimally invasive experimental methods can take recordings of large populations of neurons at high resolution under a multitude of conditions. Such data arrays possess non-trivial interdependencies along each of their axes.
View Article and Find Full Text PDFMany machine learning algorithms depend on weights that quantify row and column similarities of a data matrix. The choice of weights can dramatically impact the effectiveness of the algorithm. Nonetheless, the problem of choosing weights has arguably not been given enough study.
View Article and Find Full Text PDFAnnu Int Conf IEEE Eng Med Biol Soc
November 2021
Coronary bifurcation lesions are a leading cause of Coronary Artery Disease (CAD). Despite its prevalence, coronary bifurcation lesions remain difficult to treat due to our incomplete understanding of how various features of lesion anatomy synergistically disrupt normal hemodynamic flow. In this work, we employ an interpretable machine learning algorithm, the Classification and Regression Tree (CART), to model the impact of these geometric features on local hemodynamic quantities.
View Article and Find Full Text PDFJoint models are popular for analyzing data with multivariate responses. We propose a sparse multivariate single index model, where responses and predictors are linked by unspecified smooth functions and multiple matrix level penalties are employed to select predictors and induce low-rank structures across responses. An alternating direction method of multipliers (ADMM) based algorithm is proposed for model estimation.
View Article and Find Full Text PDFSummary: Biclustering is a generalization of clustering used to identify simultaneous grouping patterns in observations (rows) and features (columns) of a data matrix. Recently, the biclustering task has been formulated as a convex optimization problem. While this convex recasting of the problem has attractive properties, existing algorithms do not scale well.
View Article and Find Full Text PDFConventional invasive diagnostic imaging techniques do not adequately resolve complex Type B and C coronary lesions, which present unique challenges, require personalized treatment and result in worsened patient outcomes. These lesions are often excluded from large-scale non-invasive clinical trials and there does not exist a validated approach to characterize hemodynamic quantities and guide percutaneous intervention for such lesions. This work identifies key biomarkers that differentiate complex Type B and C lesions from simple Type A lesions by introducing and validating a coronary angiography-based computational fluid dynamic (CFD-CA) framework for intracoronary assessment in complex lesions at ultrahigh resolution.
View Article and Find Full Text PDFIEEE Signal Process Mag
November 2020
Cluster analysis is a fundamental tool for pattern discovery of complex heterogeneous data. Prevalent clustering methods mainly focus on vector or matrix-variate data and are not applicable to general-order tensors, which arise frequently in modern scientific and business applications. Moreover, there is a gap between statistical guarantees and computational efficiency for existing tensor clustering solutions due to the nature of their non-convex formulations.
View Article and Find Full Text PDFCanonical correlation analysis (CCA) is a multivariate analysis technique for estimating a linear relationship between two sets of measurements. Modern acquisition technologies, for example, those arising in neuroimaging and remote sensing, produce data in the form of multidimensional arrays or tensors. Classic CCA is not appropriate for dealing with tensor data due to the multidimensional structure and ultrahigh dimensionality of such modern data.
View Article and Find Full Text PDFA biologic is a product made from living organisms. A biosimilar is a new version of an already approved branded biologic. Regulatory guidelines recommend a totality-of-the-evidence approach with stepwise development for a new biosimilar.
View Article and Find Full Text PDFTocopherols and tocotrienols, collectively known as vitamin E, have received a great deal of attention because of their interesting biological activities. In the present study, we reexamined and improved previous methods of sample preparation and the conditions of high-performance liquid chromatography for more accurate quantification of tocopherols, tocotrienols and their major chain-degradation metabolites. For the analysis of serum tocopherols/tocotrienols, we reconfirmed our method of mixing serum with ethanol followed by hexane extraction.
View Article and Find Full Text PDFTo improve patients' access to safe and effective biological medicines, abbreviated licensure pathways for biosimilar and interchangeable biological products have been established in the US, Europe, and other countries around the world. The US Food and Drug Administration and European Medicines Agency have published various guidance documents on the development and approval of biosimilars, which recommend a "totality-of-the-evidence" approach with a stepwise process to demonstrate biosimilarity. The approach relies on comprehensive comparability studies ranging from analytical and nonclinical studies to clinical pharmacokinetic/pharmacodynamic (PK/PD) and efficacy studies.
View Article and Find Full Text PDFObesity is associated with an increased risk of cancer. To study the promotion of dietary carcinogen-induced gastrointestinal cancer by obesity, we employed 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine (PhIP) to induce intestinal tumorigenesis in CYP1A-humanized (hCYP1A) mice, in which mouse Cyp1a1/1a2 was replaced with human CYP1A1/1A2 Obesity was introduced in hCYP1A mice by breeding with Lepr(db/+) mice to establish the genetically induced obese hCYP1A-Lepr(db/db) mice or by feeding hCYP1A mice a high-fat diet. PhIP induced the formation of small intestinal tumors at the ages of weeks 28-40 in obese hCYP1A mice, but not in lean hCYP1A mice.
View Article and Find Full Text PDFTocopherols, the major forms of vitamin E, are a family of fat-soluble compounds that exist in alpha (α-T), beta (β-T), gamma (γ-T), and delta (δ-T) variants. A cancer preventive effect of vitamin E is suggested by epidemiological studies. However, past animal studies and human intervention trials with α-T, the most active vitamin E form, have yielded disappointing results.
View Article and Find Full Text PDFIn the biclustering problem, we seek to simultaneously group observations and features. While biclustering has applications in a wide array of domains, ranging from text mining to collaborative filtering, the problem of identifying structure in high-dimensional genomic data motivates this work. In this context, biclustering enables us to identify subsets of genes that are co-expressed only within a subset of experimental conditions.
View Article and Find Full Text PDFThe primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics.
View Article and Find Full Text PDFClustering is a fundamental problem in many scientific applications. Standard methods such as -means, Gaussian mixture models, and hierarchical clustering, however, are beset by local minima, which are sometimes drastically suboptimal. Recently introduced convex relaxations of -means and hierarchical clustering shrink cluster centroids toward one another and ensure a unique global minimizer.
View Article and Find Full Text PDFThe problem of minimizing a continuously differentiable convex function over an intersection of closed convex sets is ubiquitous in applied mathematics. It is particularly interesting when it is easy to project onto each separate set, but nontrivial to project onto their intersection. Algorithms based on Newton's method such as the interior point method are viable for small to medium-scale problems.
View Article and Find Full Text PDFModern computational statistics is turning more and more to high-dimensional optimization to handle the deluge of big data. Once a model is formulated, its parameters can be estimated by optimization. Because model parsimony is important, models routinely include nondifferentiable penalty terms such as the lasso.
View Article and Find Full Text PDF