Publications by authors named "Wankou Yang"

Article Synopsis
  • The text discusses the challenges in developing effective appearance models for visual object tracking, particularly within the current Siamese-based methods, which often struggle to distinguish between target and non-target objects.
  • It introduces a new tracking framework called Single-Branch Tracking (SBT), inspired by transformer networks, that enhances feature extraction by embedding cross-image correlations at multiple layers, resulting in a more targeted approach.
  • An improved version, SuperSBT, adopts a hierarchical architecture and includes techniques like masked image modeling and temporal modeling, leading to better performance and increased tracking speed, significantly outperforming the original SBT in various benchmark tests.
View Article and Find Full Text PDF

Unmanned Aerial Vehicles (UAVs) rely on satellite systems for stable positioning. However, due to limited satellite coverage or communication disruptions, UAVs may lose signals for positioning. In such situations, vision-based techniques can serve as an alternative, ensuring the self-positioning capability of UAVs.

View Article and Find Full Text PDF

Multiview learning (MVL), which enhances the learners' performance by coordinating complementarity and consistency among different views, has attracted much attention. The multiview generalized eigenvalue proximal support vector machine (MvGSVM) is a recently proposed effective binary classification method, which introduces the concept of MVL into the classical generalized eigenvalue proximal support vector machine (GEPSVM). However, this approach cannot guarantee good classification performance and robustness yet.

View Article and Find Full Text PDF

Recently, there are many works on discriminant analysis, which promote the robustness of models against outliers by using L- or L-norm as the distance metric. However, both of their robustness and discriminant power are limited. In this article, we present a new robust discriminant subspace (RDS) learning method for feature extraction, with an objective function formulated in a different form.

View Article and Find Full Text PDF

Eyeglasses removal is challenging in removing different kinds of eyeglasses, e.g., rimless glasses, full-rim glasses, and sunglasses, and recovering appropriate eyes.

View Article and Find Full Text PDF

Conventional multi-view re-ranking methods usually perform asymmetrical matching between the region of interest (ROI) in the query image and the whole target image for similarity computation. Due to the inconsistency in the visual appearance, this practice tends to degrade the retrieval accuracy particularly when the image ROI, which is usually interpreted as the image objectness, accounts for a smaller region in the image. Since Privileged Information (PI), which can be viewed as the image prior, is able to characterize well the image objectness, we are aiming at leveraging PI for further improving the performance of multi-view re-ranking in this paper.

View Article and Find Full Text PDF

Zero-shot learning (ZSL), a type of structured multioutput learning, has attracted much attention due to its requirement of no training data for target classes. Conventional ZSL methods usually project visual features into semantic space and assign labels by finding their nearest prototypes. However, this type of nearest neighbor search (NNS)-based method often suffers from great performance degradation because of the nonuniform variances between different categories.

View Article and Find Full Text PDF

Representation learning is a fundamental but challenging problem, especially when the distribution of data is unknown. In this paper, we propose a new representation learning method, named Structure Transfer Machine (STM), which enables feature learning process to converge at the representation expectation in a probabilistic way. We theoretically show that such an expected value of the representation (mean) is achievable if the manifold structure can be transferred from the data space to the feature space.

View Article and Find Full Text PDF

Of late, there are many studies on the robust discriminant analysis, which adopt L-norm as the distance metric, but their results are not robust enough to gain universal acceptance. To overcome this problem, the authors of this article present a nonpeaked discriminant analysis (NPDA) technique, in which cutting L-norm is adopted as the distance metric. As this kind of norm can better eliminate heavy outliers in learning models, the proposed algorithm is expected to be stronger in performing feature extraction tasks for data representation than the existing robust discriminant analysis techniques, which are based on the L-norm distance metric.

View Article and Find Full Text PDF

Learning long-term dependences (LTDs) with recurrent neural networks (RNNs) is challenging due to their limited internal memories. In this paper, we propose a new external memory architecture for RNNs called an external addressable long-term and working memory (EALWM)-augmented RNN. This architecture has two distinct advantages over existing neural external memory architectures, namely the division of the external memory into two parts-long-term memory and working memory-with both addressable and the capability to learn LTDs without suffering from vanishing gradients with necessary assumptions.

View Article and Find Full Text PDF

In recent years, visual question answering (VQA) has become topical. The premise of VQA's significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps 'understand' less than initially hoped, and instead master the easier task of exploiting cues given away in the question and biases in the answer distribution [1].

View Article and Find Full Text PDF

Fisher's criterion is one of the most popular discriminant criteria for feature extraction. It is defined as the generalized Rayleigh quotient of the between-class scatter distance to the within-class scatter distance. Consequently, Fisher's criterion does not take advantage of the discriminant information in the class covariance differences, and hence, its discriminant ability largely depends on the class mean differences.

View Article and Find Full Text PDF

Given an unreliable visual patterns and insufficient query information, content-based image retrieval is often suboptimal and requires image re-ranking using auxiliary information. In this paper, we propose a discriminative multi-view interactive image re-ranking (DMINTIR), which integrates user relevance feedback capturing users' intentions and multiple features that sufficiently describe the images. In DMINTIR, heterogeneous property features are incorporated in the multi-view learning scheme to exploit their complementarities.

View Article and Find Full Text PDF

In this paper, we propose using high-level action units to represent human actions in videos and, based on such units, a novel sparse model is developed for human action recognition. There are three interconnected components in our approach. First, we propose a new context-aware spatial-temporal descriptor, named locally weighted word context, to improve the discriminability of the traditionally used local spatial-temporal descriptors.

View Article and Find Full Text PDF