In this paper, we propose an adversarial multi-label variational hashing (AMVH) method to learn compact binary codes for efficient image retrieval. Unlike most existing deep hashing methods which only learn binary codes from specific real samples, our AMVH learns hash functions from both synthetic and real data which make our model effective for unseen data. Specifically, we design an end-to-end deep hashing framework which consists of a generator network and a discriminator-hashing network by enforcing simultaneous adversarial learning and discriminative binary codes learning to learn compact binary codes.
View Article and Find Full Text PDFThis article presents a generalized collaborative representation-based classification (GCRC) framework, which includes many existing representation-based classification (RC) methods, such as collaborative RC (CRC) and sparse RC (SRC) as special cases. This article also advances the GCRC theory by exploring theoretical conditions on the general regularization matrix. A key drawback of CRC and SRC is that they fail to use the label information of training data and are essentially unsupervised in computing the representation vector.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
August 2020
In this work, we propose a motion-guided cascaded refinement network for video object segmentation. By assuming the foreground objects show different motion patterns from the background, for each video frame we apply an active contour model on optical flow to coarsely segment the foreground. The proposed Cascaded Refinement Network (CRN) then takes as guidance the coarse segmentation to generate an accurate segmentation in full resolution.
View Article and Find Full Text PDFIn this paper, we propose a deep variational and structural hashing (DVStH) method to learn compact binary codes for multimedia retrieval. Unlike most existing deep hashing methods which use a series of convolution and fully-connected layers to learn binary features, we develop a probabilistic framework to infer latent feature representation inside the network. Then, we design a struct layer rather than a bottleneck hash layer, to obtain binary codes through a simple encoding procedure.
View Article and Find Full Text PDFVideo contents are inherently heterogeneous. To exploit different feature modalities in a diverse video collection for video summarization, we propose to formulate the task as a multiview representative selection problem. The goal is to select visual elements that are representative of a video consistently across different views (i.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
September 2018
This paper presents a sharable and individual multi-view metric learning (MvML) approach for visual recognition. Unlike conventional metric leaning methods which learn a distance metric on either a single type of feature representation or a concatenated representation of multiple types of features, the proposed MvML jointly learns an optimal combination of multiple distance metrics on multi-view representations, where not only it learns an individual distance metric for each view to retain its specific property but also a shared representation for different views in a unified latent subspace to preserve the common properties. The objective function of the MvML is formulated in the large margin learning framework via pairwise constraints, under which the distance of each similar pair is smaller than that of each dissimilar pair by a margin.
View Article and Find Full Text PDFIEEE Trans Image Process
September 2017
This paper presents a new discriminative deep metric learning (DDML) method for face and kinship verification in wild conditions. While metric learning has achieved reasonably good performance in face and kinship verification, most existing metric learning methods aim to learn a single Mahalanobis distance metric to maximize the inter-class variations and minimize the intra-class variations, which cannot capture the nonlinear manifold where face images usually lie on. To address this, we propose a DDML method to train a deep neural network to learn a set of hierarchical nonlinear transformations to project face pairs into the same latent feature space, under which the distance of each positive pair is reduced and that of each negative pair is enlarged.
View Article and Find Full Text PDFConventional metric learning methods usually assume that the training and test samples are captured in similar scenarios so that their distributions are assumed to be the same. This assumption does not hold in many real visual recognition applications, especially when samples are captured across different data sets. In this paper, we propose a new deep transfer metric learning (DTML) method to learn a set of hierarchical nonlinear transformations for cross-domain visual recognition by transferring discriminative knowledge from the labeled source domain to the unlabeled target domain.
View Article and Find Full Text PDFOver the past three decades, a number of face recognition methods have been proposed in computer vision, and most of them use holistic face images for person identification. In many real-world scenarios especially some unconstrained environments, human faces might be occluded by other objects, and it is difficult to obtain fully holistic face images for recognition. To address this, we propose a new partial face recognition approach to recognize persons of interest from their partial faces.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
January 2013
Conventional appearance-based face recognition methods usually assume that there are multiple samples per person (MSPP) available for discriminative feature extraction during the training phase. In many practical face recognition applications such as law enhancement, e-passport, and ID card identification, this assumption, however, may not hold as there is only a single sample per person (SSPP) enrolled or recorded in these systems. Many popular face recognition methods fail to work well in this scenario because there are not enough samples for discriminant learning.
View Article and Find Full Text PDFIEEE Trans Syst Man Cybern B Cybern
June 2010
We propose in this paper a parametric regularized locality preserving projections (LPP) method for face recognition. Our objective is to regulate the LPP space in a parametric manner and extract useful discriminant information from the whole feature space rather than a reduced projection subspace of principal component analysis. This results in better locality preserving power and higher recognition accuracy than the original LPP method.
View Article and Find Full Text PDFIEEE Trans Image Process
December 2009
Single-sensor digital cameras capture imagery by covering the sensor surface with a color filter array (CFA) such that each sensor pixel only samples one of three primary color values. To render a full-color image, an interpolation process, commonly referred to as CFA demosaicking, is required to estimate the other two missing color values at each pixel. In this paper, we present two contributions to the CFA demosaicking: a new and improved CFA demosaicking method for producing high quality color images and new image measures for quantifying the performance of demosaicking methods.
View Article and Find Full Text PDFMost digital still cameras acquire imagery with a color filter array (CFA), sampling only one color value for each pixel and interpolating the other two color values afterwards. The interpolation process is commonly known as demosaicking. In general, a good demosaicking method should preserve the high-frequency information of imagery as much as possible, since such information is essential for image visual quality.
View Article and Find Full Text PDFIEEE Trans Image Process
November 2006
In the conventional processing chain of single-sensor digital still cameras (DSCs), the images are captured with color filter arrays (CFAs) and the CFA samples are demosaicked into a full color image before compression. To avoid additional data redundancy created by the demosaicking process, an alternative processing chain has been proposed to move the compression process before the demosaicking. Recent empirical studies have shown that the alternative chain can outperform the conventional one in terms of image quality at low compression ratios.
View Article and Find Full Text PDFIEEE Trans Image Process
September 2006
Denoising of color images can be done on each color component independently. Recent work has shown that exploiting strong correlation between high-frequency content of different color components can improve the denoising performance. We show that for typical color images high correlation also means similarity, and propose to exploit this strong intercolor dependency using an optimal luminance/color-difference space projection.
View Article and Find Full Text PDFSignal Process Image Commun
November 2004
We propose a novel probabilistic approach to recognizing people entering and leaving a closed room in human work place or living environment. Specifically, people in the view of a monitoring camera are first tracked and represented using low-level color features. Based on a new color similarity measure, optimal recognition of people leaving and entering the room is carried out by probabilistic reasoning under the constraints imposed by the domain knowledge, e.
View Article and Find Full Text PDF