A Multi-Modal Egocentric Activity Recognition Approach towards Video Domain Generalization.

Sensors (Basel)

Department of Informatics and Telecommunications, University of Thessaly, 35100 Lamia, Greece.

Published: April 2024

Egocentric activity recognition is a prominent computer vision task that is based on the use of wearable cameras. Since egocentric videos are captured through the perspective of the person wearing the camera, her/his body motions severely complicate the video content, imposing several challenges. In this work we propose a novel approach for domain-generalized egocentric human activity recognition. Typical approaches use a large amount of training data, aiming to cover all possible variants of each action. Moreover, several recent approaches have attempted to handle discrepancies between domains with a variety of costly and mostly unsupervised domain adaptation methods. In our approach we show that through simple manipulation of available source domain data and with minor involvement from the target domain, we are able to produce robust models, able to adequately predict human activity in egocentric video sequences. To this end, we introduce a novel three-stream deep neural network architecture combining elements of vision transformers and residual neural networks which are trained using multi-modal data. We evaluate the proposed approach using a challenging, egocentric video dataset and demonstrate its superiority over recent, state-of-the-art research works.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11054491PMC
http://dx.doi.org/10.3390/s24082491DOI Listing

Publication Analysis

Top Keywords

activity recognition
12
egocentric activity
8
human activity
8
egocentric video
8
egocentric
5
multi-modal egocentric
4
activity
4
approach
4
recognition approach
4
video
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!