Publications by authors named "Siwei Lyu"

Recent works have revealed an essential paradigm in designing loss functions that differentiate individual losses versus aggregate losses. The individual loss measures the quality of the model on a sample, while the aggregate loss combines individual losses/scores over each training sample. Both have a common procedure that aggregates a set of individual values to a single numerical value.

View Article and Find Full Text PDF

So far, researchers have proposed many forensics tools to protect the authenticity and integrity of digital information. However, with the explosive development of machine learning, existing forensics tools may compromise against new attacks anytime. Hence, it is always necessary to investigate anti-forensics to expose the vulnerabilities of forensics tools.

View Article and Find Full Text PDF

Human detection and pose estimation are essential for understanding human activities in images and videos. Mainstream multi-human pose estimation methods take a top-down approach, where human detection is first performed, then each detected person bounding box is fed into a pose estimation network. This top-down approach suffers from the early commitment of initial detections in crowded scenes and other cases with ambiguities or occlusions, leading to pose estimation failures.

View Article and Find Full Text PDF

Scene parsing, or semantic segmentation, aims at labeling all pixels in an image with the predefined categories of things and stuff. Learning a robust representation for each pixel is crucial for this task. Existing state-of-the-art (SOTA) algorithms employ deep neural networks to learn (discover) the representations needed for parsing from raw data.

View Article and Find Full Text PDF

Multimodal image registration is a vital initial step in several medical image applications for providing complementary information from different data modalities. Since images with different modalities do not exhibit the same characteristics, finding their accurate correspondences remains a challenge. For convolutional multimodal registration methods, two components are quite significant: descriptive image feature as well as the suited similarity metric.

View Article and Find Full Text PDF

In this work, we introduce the average top- k ( AT) loss, which is the average over the k largest individual losses over a training data, as a new aggregate loss for supervised learning. We show that the AT loss is a natural generalization of the two widely used aggregate losses, namely the average loss and the maximum loss. Yet, the AT loss can better adapt to different data distributions because of the extra flexibility provided by the different choices of k.

View Article and Find Full Text PDF

We propose a fast online video pose estimation method to detect and track human upper-body poses based on a conditional dynamic Bayesian modeling of pose modes without referring to future frames. The estimation of human body poses from videos is an important task with many applications. Our method extends fast image-based pose estimation to live video streams by leveraging the temporal correlation of articulated poses between frames.

View Article and Find Full Text PDF

To effectively solve the challenges in object tracking, such as large deformation and severe occlusion, many existing methods use graph-based models to capture target part relations, and adopt a sequential scheme of target part selection, part matching, and state estimation. However, such methods have two major drawbacks: 1) inaccurate part selection leads to performance deterioration of part matching and state estimation and 2) there are insufficient effective global constraints for local part selection and matching. In this paper, we propose a new object tracking method based on iterative graph seeking, which integrate target part selection, part matching, and state estimation using a unified energy minimization framework.

View Article and Find Full Text PDF

Wearable electronics are in high demand, requiring that all the components are flexible. Here we report a facile approach for the fabrication of flexible polypyrrole nanowire (NPPy)/carbon fiber (CF) hybrid electrodes with high electrochemical activity using a low-cost, one-step electrodeposition method. The structure of the NPPy/CF electrodes can be easily controlled by the applied electrical potential and electrodeposition time.

View Article and Find Full Text PDF

Graph-based representation is widely used in visual tracking field by finding correct correspondences between target parts in different frames. However, most graph-based trackers consider pairwise geometric relations between local parts. They do not make full use of the target's intrinsic structure, thereby making the representation easily disturbed by errors in pairwise affinities when large deformation or occlusion occurs.

View Article and Find Full Text PDF

Recent advances in online visual tracking focus on designing part-based model to handle the deformation and occlusion challenges. However, previous methods usually consider only the pairwise structural dependences of target parts in two consecutive frames rather than the higher order constraints in multiple frames, making them less effective in handling large deformation and occlusion challenges. This paper describes a new and efficient method for online deformable object tracking.

View Article and Find Full Text PDF

Similar objects are ubiquitous and abundant in both natural and artificial scenes. Determining the visual importance of several similar objects in a complex photograph is a challenge for image understanding algorithms. This study aims to define the importance of similar objects in an image and to develop a method that can select the most important instances for an input image from multiple similar objects.

View Article and Find Full Text PDF

Most multi-object tracking algorithms are developed within the tracking-by-detection framework that consider the pairwise appearance similarities between detection responses or tracklets within a limited temporal window, and thus less effective in handling long-term occlusions or distinguishing spatially close targets with similar appearance in crowded scenes. In this work, we propose an algorithm that formulates the multi-object tracking task as one to exploit hierarchical dense structures on an undirected hypergraph constructed based on tracklet affinity. The dense structures indicate a group of vertices that are inter-connected with a set of hyperedges with high affinity values.

View Article and Find Full Text PDF

Efficient coding transforms that reduce or remove statistical dependencies in natural sensory signals are important for both biology and engineering. In recent years, divisive normalization (DN) has been advocated as a simple and effective nonlinear efficient coding transform. In this work, we first elaborate on the theoretical justification for DN as an efficient coding transform.

View Article and Find Full Text PDF

The local statistical properties of photographic images, when represented in a multi-scale basis, have been described using Gaussian scale mixtures. Here, we use this local description as a substrate for constructing a global field of Gaussian scale mixtures (FoGSMs). Specifically, we model multi-scale subbands as a product of an exponentiated homogeneous Gaussian Markov random field (hGMRF) and a second independent hGMRF.

View Article and Find Full Text PDF

We consider the problem of efficiently encoding a signal by transforming it to a new representation whose components are statistically independent. A widely studied linear solution, known as independent component analysis (ICA), exists for the case when the signal is generated as a linear transformation of independent nongaussian sources. Here, we examine a complementary case, in which the source is nongaussian and elliptically symmetric.

View Article and Find Full Text PDF
Nonlinear Image Representation Using Divisive Normalization.

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit

January 2008

In this paper, we describe a nonlinear image representation based on divisive normalization that is designed to match the statistical properties of photographic images, as well as the perceptual sensitivity of biological visual systems. We decompose an image using a multi-scale oriented representation, and use Student's t as a model of the dependencies within local clusters of coefficients. We then show that normalization of each coefficient by the square root of a linear combination of the amplitudes of the coefficients in the cluster reduces statistical dependencies.

View Article and Find Full Text PDF

We consider the problem of transforming a signal to a representation in which the components are statistically independent. When the signal is generated as a linear transformation of independent Gaussian or non-Gaussian sources, the solution may be computed using a linear transformation (PCA or ICA, respectively). Here, we consider a complementary case, in which the source is non-Gaussian but elliptically symmetric.

View Article and Find Full Text PDF

We describe a computational technique for authenticating works of art, specifically paintings and drawings, from high-resolution digital scans of the original works. This approach builds a statistical model of an artist from the scans of a set of authenticated works against which new works then are compared. The statistical model consists of first- and higher-order wavelet statistics.

View Article and Find Full Text PDF