RGB-T tracking involves the use of images from both visible and thermal modalities. The primary objective is to adaptively leverage the relatively dominant modality in varying conditions to achieve more robust tracking compared to single-modality tracking. An RGB-T tracker based on a mixed-attention mechanism to achieve a complementary fusion of modalities (referred to as MACFT) is proposed in this paper. In the feature extraction stage, we utilize different transformer backbone branches to extract specific and shared information from different modalities. By performing mixed-attention operations in the backbone to enable information interaction and self-enhancement between the template and search images, a robust feature representation is constructed that better understands the high-level semantic features of the target. Then, in the feature fusion stage, a modality shared-specific feature interaction structure was designed based on a mixed-attention mechanism, effectively suppressing low-quality modality noise while enhancing the information from the dominant modality. Evaluation on multiple RGB-T public datasets demonstrates that our proposed tracker outperforms other RGB-T trackers on general evaluation metrics while also being able to adapt to long-term tracking scenarios.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384326 | PMC |
http://dx.doi.org/10.3390/s23146609 | DOI Listing |
IEEE Trans Pattern Anal Mach Intell
October 2024
The goal of RGB-Thermal (RGB-T) tracking is to utilize the synergistic and complementary strengths of RGB and TIR modalities to enhance tracking in diverse situations, with cross-modal interaction being a crucial element. Earlier methods often simply combine the features of the RGB and TIR search frames, leading to a coarse interaction that also introduced unnecessary background noise. Many other approaches sample candidate boxes from search frames and apply different fusion techniques to individual pairs of RGB and TIR boxes, which confines cross-modal interactions to local areas and results in insufficient context modeling.
View Article and Find Full Text PDFIEEE Trans Image Process
July 2024
In RGB-T tracking, there exist rich spatial relationships between the target and backgrounds within multi-modal data as well as sound consistencies of spatial relationships among successive frames, which are crucial for boosting the tracking performance. However, most existing RGB-T trackers overlook such multi-modal spatial relationships and temporal consistencies within RGB-T videos, hindering them from robust tracking and practical applications in complex scenarios. In this paper, we propose a novel Multi-modal Spatial-Temporal Context (MMSTC) network for RGB-T tracking, which employs a Transformer architecture for the construction of reliable multi-modal spatial context information and the effective propagation of temporal context information.
View Article and Find Full Text PDFIEEE Trans Image Process
May 2024
Existing RGB-Thermal trackers usually treat intra-modal feature extraction and inter-modal feature fusion as two separate processes, therefore the mutual promotion of extraction and fusion is neglected. Then, the complementary advantages of RGB-T fusion are not fully exploited, and the independent feature extraction is not adaptive to modal quality fluctuation during tracking. To address the limitations, we design a joint-modality query fusion network, in which the intra-modal feature extraction and the inter-modal fusion are coupled together and promote each other via joint-modality queries.
View Article and Find Full Text PDFSci Rep
August 2023
Silla University, 140, Baekyang-daero 700beon-gil, Sasang-gu, 46958, Busan, Korea.
In recent years, many RGB-THERMAL tracking methods have been proposed to meet the needs of single object tracking under different conditions. However, these trackers are based on ANCHOR-BASED algorithms and feature cross-correlation operations, making it difficult to improve the success rate of target tracking. We propose a siamAFTS tracking network, which is based on ANCHOR-FREE and utilizes a fully convolutional training network with a Transformer module, suitable for RGB-THERMAL target tracking.
View Article and Find Full Text PDFSensors (Basel)
July 2023
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China.
RGB-T tracking involves the use of images from both visible and thermal modalities. The primary objective is to adaptively leverage the relatively dominant modality in varying conditions to achieve more robust tracking compared to single-modality tracking. An RGB-T tracker based on a mixed-attention mechanism to achieve a complementary fusion of modalities (referred to as MACFT) is proposed in this paper.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!