Principal Component Analysis (PCA) and its nonlinear extension Kernel PCA (KPCA) are widely used across science and industry for data analysis and dimensionality reduction. Modern deep learning tools have achieved great empirical success, but a framework for deep principal component analysis is still lacking. Here we develop a deep kernel PCA methodology (DKPCA) to extract multiple levels of the most informative components of the data.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
August 2023
Asymmetric kernels naturally exist in real life, e.g., for conditional probability and directed graphs.
View Article and Find Full Text PDFDisentanglement is a useful property in representation learning, which increases the interpretability of generative models such as variational autoencoders (VAE), generative adversarial models, and their many variants. Typically in such models, an increase in disentanglement performance is traded off with generation quality. In the context of latent space models, this work presents a representation learning framework that explicitly promotes disentanglement by encouraging orthogonal directions of variations.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
February 2024
Supervised learning can be viewed as distilling relevant information from input data into feature representations. This process becomes difficult when supervision is noisy as the distilled information might not be relevant. In fact, recent research shows that networks can easily overfit all labels including those that are corrupted, and hence can hardly generalize to clean datasets.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
November 2022
In this paper, we develop a quadrature framework for large-scale kernel machines via a numerical integration representation. Considering that the integration domain and measure of typical kernels, e.g.
View Article and Find Full Text PDFWe introduce Constr-DRKM, a deep kernel method for the unsupervised learning of disentangled data representations. We propose augmenting the original deep restricted kernel machine formulation for kernel PCA by orthogonality constraints on the latent variables to promote disentanglement and to make it possible to carry out optimization without first defining a stabilized objective. After discussing a number of algorithms for end-to-end training, we quantitatively evaluate the proposed method's effectiveness in disentangled feature learning.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
October 2022
The class of random features is one of the most popular techniques to speed up kernel methods in large-scale problems. Related works have been recognized by the NeurIPS Test-of-Time award in 2017 and the ICML Best Paper Finalist in 2019. The body of work on random features has grown rapidly, and hence it is desirable to have a comprehensive overview on this topic explaining the connections among various algorithms and theoretical results.
View Article and Find Full Text PDFThe adaptive hinging hyperplane (AHH) model is a popular piecewise linear representation with a generalized tree structure and has been successfully applied in dynamic system identification. In this article, we aim to construct the deep AHH (DAHH) model to extend and generalize the networking of AHH model for high-dimensional problems. The network structure of DAHH is determined through a forward growth, in which the activity ratio is introduced to select effective neurons and no connecting weights are involved between the layers.
View Article and Find Full Text PDFThis paper introduces a novel framework for generative models based on Restricted Kernel Machines (RKMs) with joint multi-view generation and uncorrelated feature learning, called Gen-RKM. To enable joint multi-view generation, this mechanism uses a shared representation of data from various views. Furthermore, the model has a primal and dual formulation to incorporate both kernel-based and (deep convolutional) neural network based models within the same setting.
View Article and Find Full Text PDFLong Short-Term Memory (LSTM) has shown significant performance on many real-world applications due to its ability to capture long-term dependencies. In this paper, we utilize LSTM to obtain a data-driven forecasting model for an application of weather forecasting. Moreover, we propose Transductive LSTM (T-LSTM) which exploits the local information in time-series prediction.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
August 2020
Random Fourier features (RFFs) have been successfully employed to kernel approximation in large-scale situations. The rationale behind RFF relies on Bochner's theorem, but the condition is too strict and excludes many widely used kernels, e.g.
View Article and Find Full Text PDFKernel regression models have been used as non-parametric methods for fitting experimental data. However, due to their non-parametric nature, they belong to the so-called "black box" models, indicating that the relation between the input variables and the output, depending on the kernel selection, is unknown. In this paper we propose a new methodology to retrieve the relation between each input regressor variable and the output in a least squares support vector machine (LS-SVM) regression model.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
March 2019
In kernel methods, the kernels are often required to be positive definitethat restricts the use of many indefinite kernels. To consider those nonpositive definite kernels, in this paper, we aim to build an indefinite kernel learning framework for kernel logistic regression (KLR). The proposed indefinite KLR (IKLR) model is analyzed in the reproducing kernel Kreĭn spaces and then becomes nonconvex.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
October 2018
In pattern classification, polynomial classifiers are well-studied methods as they are capable of generating complex decision surfaces. Unfortunately, the use of multivariate polynomials is limited to kernels as in support-vector machines, because polynomials quickly become impractical for high-dimensional problems. In this paper, we effectively overcome the curse of dimensionality by employing the tensor train (TT) format to represent a polynomial classifier.
View Article and Find Full Text PDFEntropy measures have been a major interest of researchers to measure the information content of a dynamical system. One of the well-known methodologies is sample entropy, which is a model-free approach and can be deployed to measure the information transfer in time series. Sample entropy is based on the conditional entropy where a major concern is the number of past delays in the conditional term.
View Article and Find Full Text PDFThis paper studies the matrix completion problems when the entries are contaminated by non-Gaussian noise or outliers. The proposed approach employs a nonconvex loss function induced by the maximum correntropy criterion. With the help of this loss function, we develop a rank constrained, as well as a nuclear norm regularized model, which is resistant to non-Gaussian noise and outliers.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
July 2018
Domain adaptation learning is one of the fundamental research topics in pattern recognition and machine learning. This paper introduces a regularized semipaired kernel canonical correlation analysis formulation for learning a latent space for the domain adaptation problem. The optimization problem is formulated in the primal-dual least squares support vector machine setting where side information can be readily incorporated through regularization terms.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
August 2018
In this brief, kernel principal component analysis (KPCA) is reinterpreted as the solution to a convex optimization problem. Actually, there is a constrained convex problem for each principal component, so that the constraints guarantee that the principal component is indeed a solution, and not a mere saddle point. Although these insights do not imply any algorithmic improvement, they can be used to further understand the method, formulate possible extensions, and properly address them.
View Article and Find Full Text PDFThe aim of this letter is to propose a theory of deep restricted kernel machines offering new foundations for deep learning with kernel machines. From the viewpoint of deep learning, it is partially related to restricted Boltzmann machines, which are characterized by visible and hidden units in a bipartite graph without hidden-to-hidden connections and deep learning extensions as deep belief networks and deep Boltzmann machines. From the viewpoint of kernel machines, it includes least squares support vector machines for classification and regression, kernel principal component analysis (PCA), matrix singular value decomposition, and Parzen-type models.
View Article and Find Full Text PDFThis brief proposes a truncated distance (TL1) kernel, which results in a classifier that is nonlinear in the global region but is linear in each subregion. With this kernel, the subregion structure can be trained using all the training data and local linear classifiers can be established simultaneously. The TL1 kernel has good adaptiveness to nonlinearity and is suitable for problems which require different nonlinearities in different areas.
View Article and Find Full Text PDFCommunities in directed networks have often been characterized as regions with a high density of links, or as sets of nodes with certain patterns of connection. Our approach for community detection combines the optimization of a quality function and a spectral clustering of a deformation of the combinatorial Laplacian, the so-called magnetic Laplacian. The eigenfunctions of the magnetic Laplacian, which we call magnetic eigenmaps, incorporate structural information.
View Article and Find Full Text PDFProblem Setting: Support vector machines (SVMs) are very popular tools for classification, regression and other problems. Due to the large choice of kernels they can be applied with, a large variety of data can be analysed using these tools. Machine learning thanks its popularity to the good performance of the resulting models.
View Article and Find Full Text PDFThis letter investigates the supervised learning problem with observations drawn from certain general stationary stochastic processes. Here by general, we mean that many stationary stochastic processes can be included. We show that when the stochastic processes satisfy a generalized Bernstein-type inequality, a unified treatment on analyzing the learning schemes with various mixing processes can be conducted and a sharp oracle inequality for generic regularized empirical risk minimization schemes can be established.
View Article and Find Full Text PDFThis letter addresses the robustness problem when learning a large margin classifier in the presence of label noise. In our study, we achieve this purpose by proposing robustified large margin support vector machines. The robustness of the proposed robust support vector classifiers (RSVC), which is interpreted from a weighted viewpoint in this work, is due to the use of nonconvex classification losses.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
July 2017
Applying the pinball loss in a support vector machine (SVM) classifier results in pin-SVM. The pinball loss is characterized by a parameter τ . Its value is related to the quantile level and different τ values are suitable for different problems.
View Article and Find Full Text PDF