Existing RGBT tracking methods usually localize a target object with a bounding box, in which the trackers are often affected by the inclusion of background clutter. To address this issue, this article presents a novel algorithm, called noise-robust cross-modal ranking, to suppress background effects in target bounding boxes for RGBT tracking. In particular, we handle the noise interference in cross-modal fusion and seed labels from the following two aspects. First, the soft cross-modality consistency is proposed to allow the sparse inconsistency in fusing different modalities, aiming to take both collaboration and heterogeneity of different modalities into account for more effective fusion. Second, the optimal seed learning is designed to handle label noises of ranking seeds caused by some problems, such as irregular object shape and occlusion. In addition, to deploy the complementarity and maintain the structural information of different features within each modality, we perform an individual ranking for each feature and employ a cross-feature consistency to pursue their collaboration. A unified optimization framework with an efficient convergence speed is developed to solve the proposed model. Extensive experiments demonstrate the effectiveness and efficiency of the proposed approach comparing with state-of-the-art tracking methods on GTOT and RGBT234 benchmark data sets.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2021.3067107DOI Listing

Publication Analysis

Top Keywords

rgbt tracking
12
noise-robust cross-modal
8
cross-modal ranking
8
tracking methods
8
tracking noise-robust
4
ranking
4
ranking existing
4
existing rgbt
4
methods localize
4
localize target
4

Similar Publications

Visible-thermal small object detection (RGBT SOD) is a significant yet challenging task with a wide range of applications, including video surveillance, traffic monitoring, search and rescue. However, existing studies mainly focus on either visible or thermal modality, while RGBT SOD is rarely explored. Although some RGBT datasets have been developed, the insufficient quantity, limited diversity, unitary application, misaligned images and large target size cannot provide an impartial benchmark to evaluate RGBT SOD algorithms.

View Article and Find Full Text PDF

The goal of RGB-Thermal (RGB-T) tracking is to utilize the synergistic and complementary strengths of RGB and TIR modalities to enhance tracking in diverse situations, with cross-modal interaction being a crucial element. Earlier methods often simply combine the features of the RGB and TIR search frames, leading to a coarse interaction that also introduced unnecessary background noise. Many other approaches sample candidate boxes from search frames and apply different fusion techniques to individual pairs of RGB and TIR boxes, which confines cross-modal interactions to local areas and results in insufficient context modeling.

View Article and Find Full Text PDF

In RGB-T tracking, there exist rich spatial relationships between the target and backgrounds within multi-modal data as well as sound consistencies of spatial relationships among successive frames, which are crucial for boosting the tracking performance. However, most existing RGB-T trackers overlook such multi-modal spatial relationships and temporal consistencies within RGB-T videos, hindering them from robust tracking and practical applications in complex scenarios. In this paper, we propose a novel Multi-modal Spatial-Temporal Context (MMSTC) network for RGB-T tracking, which employs a Transformer architecture for the construction of reliable multi-modal spatial context information and the effective propagation of temporal context information.

View Article and Find Full Text PDF

Existing RGB-Thermal trackers usually treat intra-modal feature extraction and inter-modal feature fusion as two separate processes, therefore the mutual promotion of extraction and fusion is neglected. Then, the complementary advantages of RGB-T fusion are not fully exploited, and the independent feature extraction is not adaptive to modal quality fluctuation during tracking. To address the limitations, we design a joint-modality query fusion network, in which the intra-modal feature extraction and the inter-modal fusion are coupled together and promote each other via joint-modality queries.

View Article and Find Full Text PDF

RGB and thermal source data suffer from both shared and specific challenges, and how to explore and exploit them plays a critical role in representing the target appearance in RGBT tracking. In this paper, we propose a novel approach, which performs target appearance representation disentanglement and interaction via both modality-shared and modality-specific challenge attributes, for robust RGBT tracking. In particular, we disentangle the target appearance representations via five challenge-based branches with different structures according to their properties, including three parameter-shared branches to model modality-shared challenges and two parameter-independent branches to model modality-specific challenges.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!