Publications by authors named "Joshua Z Huang"

Nowadays, several companies prefer storing their data on multiple data centers with replication for many reasons. The data that spans various data centers ensures the fastest possible response time for customers and workforces who are geographically separated. It also provides protecting the information from the loss in case a single data center experiences a disaster.

View Article and Find Full Text PDF

For decision-making support and evidence based on healthcare, high quality data are crucial, particularly if the emphasized knowledge is lacking. For public health practitioners and researchers, the reporting of COVID-19 data need to be accurate and easily available. Each nation has a system in place for reporting COVID-19 data, albeit these systems' efficacy has not been thoroughly evaluated.

View Article and Find Full Text PDF

An obvious defect of extreme learning machine (ELM) is that its prediction performance is sensitive to the random initialization of input-layer weights and hidden-layer biases. To make ELM insensitive to random initialization, GPRELM adopts the simple an effective strategy of integrating Gaussian process regression into ELM. However, there is a serious overfitting problem in kernel-based GPRELM (GPRELM).

View Article and Find Full Text PDF

The optimization methods for solving the normalized cut model usually involve three steps, i.e., problem relaxation, problem solving and post-processing.

View Article and Find Full Text PDF

Event-based social networks (EBSNs) are widely used to create online social groups and organize offline events for users. Activeness and loyalty are crucial characteristics of these online social groups in terms of determining the growth or inactiveness of the social groups in a specific time frame. However, there is less research on these concepts to clarify the existence of groups in event-based social networks.

View Article and Find Full Text PDF

Although many spectral clustering algorithms have been proposed during the past decades, they are not scalable to large-scale data due to their high computational complexities. In this paper, we propose a novel spectral clustering method for large-scale data, namely, large-scale balanced min cut (LABIN). A new model is proposed to extend the self-balanced min-cut (SBMC) model with the anchor-based strategy and a fast spectral rotation with linear time complexity is proposed to solve the new model.

View Article and Find Full Text PDF

Most feature selection methods first compute a similarity matrix by assigning a fixed value to pairs of objects in the whole data or to pairs of objects in a class or by computing the similarity between two objects from the original data. The similarity matrix is fixed as a constant in the subsequent feature selection process. However, the similarities computed from the original data may be unreliable, because they are affected by noise features.

View Article and Find Full Text PDF

In data mining, objects are often represented by a set of features, where each feature of an object has only one value. However, in reality, some features can take on multiple values, for instance, a person with several job titles, hobbies, and email addresses. These features can be referred to as set-valued features and are often treated with dummy features when using existing data mining algorithms to analyze data with set-valued features.

View Article and Find Full Text PDF

Microarray technology enables the collection of vast amounts of gene expression data from biological experiments. Clustering algorithms have been successfully applied to exploring the gene expression data. Since a set of genes may be only correlated to a subset of samples, it is useful to use co-clustering to recover co-clusters in the gene expression data.

View Article and Find Full Text PDF

Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data.

View Article and Find Full Text PDF

This paper proposes a new analytical process highlighted by a soft subspace clustering method, a changing window technique, and a series of post-processing strategies to enhance the identification and characterisation of local gene expression patterns. The proposed method can be conducted in an interactive way, facilitating the exploration and analysis of local gene expression patterns in real applications. Experimental results have shown that the proposed method is effective in identification and characterization of functional gene groups in terms of both local expression similarities and biological coherence of genes in a cluster.

View Article and Find Full Text PDF

This correspondence describes extensions to the k-modes algorithm for clustering categorical data. By modifying a simple matching dissimilarity measure for categorical objects, a heuristic approach was developed in [4], [12] which allows the use of the k-modes paradigm to obtain a cluster with strong intrasimilarity and to efficiently cluster large categorical data sets. The main aim of this paper is to rigorously derive the updating formula of the k-modes clustering algorithm with the new dissimilarity measure and the convergence of the algorithm under the optimization framework.

View Article and Find Full Text PDF

This paper proposes a k-means type clustering algorithm that can automatically calculate variable weights. A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed. The convergency theorem of the new clustering process is given.

View Article and Find Full Text PDF