Publications by authors named "Sanping Zhou"

Article Synopsis
  • - Unsupervised person re-identification (Re-ID) faces challenges due to the absence of labeled data, often relying on inaccurate cluster estimates for pseudo labels, which can hinder performance.
  • - The proposed method, called meta pairwise relationship distillation (MPRD), leverages graph convolutional networks (GCN) to create reliable pairwise relationships that help improve feature learning without needing to define cluster numbers.
  • - Additionally, the method introduces two components: a hard sample deduction (HSD) module to identify problematic pseudo labels and a positive pair alignment (PPA) module to reduce redundancy in feature information, resulting in better performance on various datasets compared to existing unsupervised approaches.
View Article and Find Full Text PDF

Unsupervised person re-identification (Re-ID) is challenging due to the lack of ground truth labels. Most existing methods employ iterative clustering to generate pseudo labels for unlabeled training data to guide the learning process. However, how to select samples that are both associated with high-confidence pseudo labels and hard (discriminative) enough remains a critical problem.

View Article and Find Full Text PDF

In a typical image inpainting task, the location and shape of the damaged or masked area is often random and irregular. The vanilla convolutions widely used in learning-based inpainting models treat all spatial features as valid and share parameters across regions, making it difficult for them to cope with those irregular damages, and models tend to produce inpainting results with color discrepancy and blurriness. In this paper, we propose a novel Context Adaptive Network (CANet) to address this issue.

View Article and Find Full Text PDF

Representing multimodal behaviors is a critical challenge for pedestrian trajectory prediction. Previous methods commonly represent this multimodality with multiple latent variables repeatedly sampled from a latent space, encountering difficulties in interpretable trajectory prediction. Moreover, the latent space is usually built by encoding global interaction into future trajectory, which inevitably introduces superfluous interactions and thus leads to performance reduction.

View Article and Find Full Text PDF

Network ensemble aims to obtain better results by aggregating the predictions of multiple weak networks, in which how to keep the diversity of different networks plays a critical role in the training process. Many existing approaches keep this kind of diversity either by simply using different network initializations or data partitions, which often requires repeated attempts to pursue a relatively high performance. In this article, we propose a novel inverse adversarial diversity learning (IADL) method to learn a simple yet effective ensemble regime, which can be easily implemented in the following two steps.

View Article and Find Full Text PDF

Video panoptic segmentation is an important but challenging task in computer vision. It not only performs panoptic segmentation of each frame, but also associates the same instance across adjacent frames. Due to the lack of temporal coherence modeling, most existing approaches often generate identity switches during instance association, and they cannot handle ambiguous segmentation boundaries caused by motion blur.

View Article and Find Full Text PDF

Most existing trackers use bounding boxes for object tracking. However, the background contained in the bounding box inevitably decreases the accuracy of the target model, which affects the performance of the tracker and is particularly pronounced for non-rigid objects. To address the above issue, this paper proposes a novel hybrid level set model, which can robustly address the issue of topology changing, occlusions and abrupt motion in non-rigid object tracking by accurately tracking the object contour.

View Article and Find Full Text PDF

Intensity inhomogeneity and noise are two common issues in images but inevitably lead to significant challenges for image segmentation and is particularly pronounced when the two issues simultaneously appear in one image. As a result, most existing level set models yield poor performance when applied to this images. To this end, this paper proposes a novel hybrid level set model, named adaptive variational level set model (AVLSM) by integrating an adaptive scale bias field correction term and a denoising term into one level set framework, which can simultaneously correct the severe inhomogeneous intensity and denoise in segmentation.

View Article and Find Full Text PDF

Task-free attention has gained intensive interest in the computer vision community while relatively few works focus on task-driven attention (TDAttention). Thus this paper handles the problem of TDAttention prediction in daily scenarios where a human is doing a task. Motivated by the cognition mechanism that human attention allocation is jointly controlled by the top-down guidance and bottom-up stimulus, this paper proposes a cognitively-explanatory deep neural network model to predict TDAttention.

View Article and Find Full Text PDF

Most of the existing Multi-Object Tracking (MOT) approaches follow the Tracking-by-Detection and Data Association paradigm, in which objects are firstly detected and then associated in the tracking process. In recent years, deep neural network has been utilized to obtain more discriminative appearance features for cross-frame association, and noticeable performance improvement has been reported. On the other hand, the Tracking-by-Detection framework is yet not completely end-to-end, which leads to huge computation and limited performance especially in the inference (tracking) process.

View Article and Find Full Text PDF

Person reidentification (Re-ID) aims at matching images of the same identity captured from the disjoint camera views, which remains a very challenging problem due to the large cross-view appearance variations. In practice, the mainstream methods usually learn a discriminative feature representation using a deep neural network, which needs a large number of labeled samples in the training process. In this article, we design a simple yet effective multinetwork collaborative feature learning (MCFL) framework to alleviate the data annotation requirement for person Re-ID, which can confidently estimate the pseudolabels of unlabeled sample pairs and consistently learn the discriminative features of input images.

View Article and Find Full Text PDF

Salient object detection has undergone a very rapid development with the blooming of Deep Neural Network (DNN), which is usually taken as an important preprocessing procedure in various computer vision tasks. However, the down-sampling operations, such as pooling and striding, always make the final predictions blurred at edges, which has seriously degenerated the performance of salient object detection. In this paper, we propose a simple yet effective approach, i.

View Article and Find Full Text PDF

Salient object detection aims at locating the most conspicuous objects in natural images, which usually acts as a very important pre-processing procedure in many computer vision tasks. In this paper, we propose a simple yet effective Hierarchical U-shape Attention Network (HUAN) to learn a robust mapping function for salient object detection. Firstly, a novel attention mechanism is formulated to improve the well-known U-shape network [1], in which the memory consumption can be extensively reduced and the mask quality can be significantly improved by the resulting U-shape Attention Network (UAN).

View Article and Find Full Text PDF

The performance of person re-identification (Re-ID) has been seriously effected by the large cross-view appearance variations caused by mutual occlusions and background clutters. Hence learning a feature representation that can adaptively emphasize the foreground persons becomes very critical to solve the person Re-ID problem. In this paper, we propose a simple yet effective foreground attentive neural network (FANN) to learn a discriminative feature representation for person Re-ID, which can adaptively enhance the positive side of foreground and weaken the negative side of background.

View Article and Find Full Text PDF