Publications by authors named "Vincent Y F Tan"

Mortality risk is a major concern to patients who have just been discharged from the intensive care unit (ICU). Many studies have been directed to construct machine learning models to predict such risk. Although these models are highly accurate, they are less amenable to interpretation and clinicians are typically unable to gain further insights into the patients' health conditions and the underlying factors that influence their mortality risk.

View Article and Find Full Text PDF

Nonnegative matrix factorization (NMF) is a linear dimensionality reduction technique for analyzing nonnegative data. A key aspect of NMF is the choice of the objective function that depends on the noise model (or statistics of the noise) assumed on the data. In many applications, the noise model is unknown and difficult to estimate.

View Article and Find Full Text PDF

We revisit the distributed hypothesis testing (or hypothesis testing with communication constraints) problem from the viewpoint of privacy. Instead of observing the raw data directly, the transmitter observes a sanitized or randomized version of it. We impose an upper bound on the mutual information between the raw and randomized data.

View Article and Find Full Text PDF

Human leukocyte antigen class I (HLA)-restricted CD8(+) T lymphocyte (CTL) responses are crucial to HIV-1 control. Although HIV can evade these responses, the longer-term impact of viral escape mutants remains unclear, as these variants can also reduce intrinsic viral fitness. To address this, we here developed a metric to determine the degree of HIV adaptation to an HLA profile.

View Article and Find Full Text PDF

We propose a novel framework of using a parsimonious statistical model, known as mixture of Gaussian trees, for modeling the possibly multimodal minority class to solve the problem of imbalanced time-series classification. By exploiting the fact that close-by time points are highly correlated due to smoothness of the time-series, our model significantly reduces the number of covariance parameters to be estimated from O(d(2)) to O(Ld), where L is the number of mixture components and d is the dimensionality. Thus, our model is particularly effective for modeling high-dimensional time-series with limited number of instances in the minority positive class.

View Article and Find Full Text PDF

This paper addresses the estimation of the latent dimensionality in nonnegative matrix factorization (NMF) with the β-divergence. The β-divergence is a family of cost functions that includes the squared euclidean distance, Kullback-Leibler (KL) and Itakura-Saito (IS) divergences as special cases. Learning the model order is important as it is necessary to strike the right balance between data fidelity and overfitting.

View Article and Find Full Text PDF

Rationale: The pattern of IgE response (over time or to specific allergens) may reflect different atopic vulnerabilities which are related to the presence of asthma in a fundamentally different way from current definition of atopy.

Objectives: To redefine the atopic phenotype by identifying latent structure within a complex dataset, taking into account the timing and type of sensitization to specific allergens, and relating these novel phenotypes to asthma.

Methods: In a population-based birth cohort in which multiple skin and IgE tests have been taken throughout childhood, we used a machine learning approach to cluster children into multiple atopic classes in an unsupervised way.

View Article and Find Full Text PDF