It is difficult to achieve all-weather visual object tracking in an open environment only utilizing single modality data input. Due to the complementarity of RGB and thermal infrared (TIR) data in various complex environments, a more robust object tracking framework can be obtained using video data of these two modalities. The fusion methods of RGB and TIR data are the core elements to determine the performance of the RGB-T object tracking method, and the existing RGB-T trackers have not solved this problem well. In order to solve the current low utilization of information intra single modality in aggregation-based methods and between two modalities in alignment-based methods, we used DiMP as the baseline tracker to design an RGB-T object tracking framework channel exchanging DiMP (CEDiMP) based on channel exchanging. CEDiMP achieves dynamic channel exchanging between sub-networks of different modes hardly adding any parameters during the feature fusion process. The expression ability of the deep features generated by our data fusion method based on channel exchanging is stronger. At the same time, in order to solve the poor generalization ability of the existing RGB-T object tracking methods and the poor ability in the long-term object tracking, more training of CEDiMP on the synthetic dataset LaSOT-RGBT is added. A large number of experiments demonstrate the effectiveness of the proposed model. CEDiMP achieves the best performance on two RGB-T object tracking benchmark datasets, GTOT and RGBT234, and performs outstandingly in the generalization testing.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8434326 | PMC |
http://dx.doi.org/10.3390/s21175800 | DOI Listing |
Rev Sci Instrum
January 2025
Shanxi Key Laboratory of Intelligent Detection Technology and Equipment, School of Information and Communication Engineering, North University of China, Taiyuan 030051, Shanxi, China.
Real-time moving target trajectory prediction is highly valuable in applications such as automatic driving, target tracking, and motion prediction. This paper examines the projection of three-dimensional random motion of an object in space onto a sensing plane as an illustrative example. Historical running trajectory data are used to train a reserve network.
View Article and Find Full Text PDFInfancy
January 2025
Institute of Child Development, University of Minnesota, Minneapolis, Minnesota, USA.
East Asians are more likely than North Americans to attend to visual scenes holistically, focusing on the relations between objects and their background rather than isolating components. This cultural difference in context sensitivity-greater attentional allocation to the background of an image or scene-has been attributed to socialization, yet it is unknown how early in development it appears, and whether it is moderated by social information. We employed eye-tracking to investigate context-sensitivity in 15-month-olds in Japan (n = 45) and the United States (n = 52).
View Article and Find Full Text PDFSci Rep
January 2025
School of Electrical and Control Engineering, North China University of Technology, Beijing, China.
This paper proposes a new strategy for analysing and detecting abnormal passenger behavior and abnormal objects on buses. First, a library of abnormal passenger behaviors and objects on buses is established. Then, a new mask detection and abnormal object detection and analysis (MD-AODA) algorithm is proposed.
View Article and Find Full Text PDFHealthc Technol Lett
December 2024
Robotics and Control Laboratory, Department of Electrical and Computer Engineering The University of British Columbia Vancouver Canada.
The Segment Anything model (SAM) is a powerful vision foundation model that is revolutionizing the traditional paradigm of segmentation. Despite this, a reliance on prompting each frame and large computational cost limit its usage in robotically assisted surgery. Applications, such as augmented reality guidance, require little user intervention along with efficient inference to be usable clinically.
View Article and Find Full Text PDFData Brief
February 2025
Department of Electrical and Computer Engineering, University of Michigan-Dearborn, 4901 Evergreen Rd, Dearborn, 48128 MI, USA.
In this data article, we introduce the Multi-Modal Event-based Vehicle Detection and Tracking (MEVDT) dataset. This dataset provides a synchronized stream of event data and grayscale images of traffic scenes, captured using the Dynamic and Active-Pixel Vision Sensor (DAVIS) 240c hybrid event-based camera. MEVDT comprises 63 multi-modal sequences with approximately 13k images, 5M events, 10k object labels, and 85 unique object tracking trajectories.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!