In this paper, we tackle the problem of egocentric action anticipation, i.e., predicting what actions the camera wearer will perform in the near future and which objects they will interact with. Specifically, we contribute Rolling-Unrolling LSTM, a learning architecture to anticipate actions from egocentric videos. The method is based on three components: 1) an architecture comprised of two LSTMs to model the sub-tasks of summarizing the past and inferring the future, 2) a Sequence Completion Pre-Training technique which encourages the LSTMs to focus on the different sub-tasks, and 3) a Modality ATTention (MATT) mechanism to efficiently fuse multi-modal predictions performed by processing RGB frames, optical flow fields and object-based features. The proposed approach is validated on EPIC-Kitchens, EGTEA Gaze+ and ActivityNet. The experiments show that the proposed architecture is state-of-the-art in the domain of egocentric videos, achieving top performances in the 2019 EPIC-Kitchens egocentric action anticipation challenge. The approach also achieves competitive performance on ActivityNet with respect to methods not based on unsupervised pre-training and generalizes to the tasks of early action recognition and action recognition. To encourage research on this challenging topic, we made our code, trained models, and pre-extracted features available at our web page: http://iplab.dmi.unict.it/rulstm.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2020.2992889DOI Listing

Publication Analysis

Top Keywords

action anticipation
12
egocentric action
8
egocentric videos
8
action recognition
8
action
5
rolling-unrolling lstms
4
lstms action
4
anticipation first-person
4
first-person video
4
video paper
4

Similar Publications

Background: Diarrheal infections continue to be a major public health concern in Bangladesh, especially in urban areas where population density and environmental variables increase dissemination risks. Identifying the intricate connections between weather variables and diarrhea epidemics is critical for developing effective public health remedies.

Methods: We deploy the novel approach of Wavelet-Autoregressive Integrated Moving Average with Exogenous Variable (WARIMAX) and the traditional Autoregressive Integrated Moving Average with Exogenous Variable (ARIMAX) technique to forecast the incidence of diarrhea by analyzing the influence of climate factors.

View Article and Find Full Text PDF

A Computational Model of Hybrid Trunk-like Robots for Synergy Formation in Anticipation of Physical Interaction.

Biomimetics (Basel)

January 2025

Robotic, Brain, and Cognitive Sciences Research Unit, Italian Institute of Technology, Center for Human Technologies, Via Enrico Melen 83, Bldg B, 16152 Genoa, Italy.

Trunk-like robots have attracted a lot of attention in the community of researchers interested in the general field of bio-inspired soft robotics, because trunk-like soft arms may offer high dexterity and adaptability very similar to elephants and potentially quite superior to traditional articulated manipulators. In view of the practical applications, the integration of a soft hydrostatic segment with a hard-articulated segment, i.e.

View Article and Find Full Text PDF

Aim: In February 2024, the Aotearoa New Zealand Government repealed legislation to mandate very low nicotine cigarettes (VLNCs), greatly reduce the number of tobacco retailers and disallow sale of tobacco products to people born after 2008 (smokefree generation). We investigated acceptability and likely impacts of these measures among people who smoke or who recently (≤2 years) quit smoking.

Method: We analysed data from 1,230 participants from Wave 3 (conducted in late 2020 and early 2021) and 615 participants from Wave 3.

View Article and Find Full Text PDF

This study examined the effects of different fatigue types on action anticipation and physical performance in high-level volleyball players. Twenty-four participants underwent four counterbalanced conditions: 60-min cycling at 60% peak power output, 60-min Stroop task, 60-min cycling at 60% peak power output with Stroop task and 60-min neutral documentary to induce physical fatigue (PF), mental fatigue (MF), dual fatigue (DF) and control group (CG), respectively. Action anticipation (anticipation test and visual search test) and physical performance (countermovement jump, T-test, and spike test) were conducted at baseline, immediately after (Post1), and after 10-min rest (Post2).

View Article and Find Full Text PDF

Introduction: In fast ball sports, such as tennis, when spatiotemporal constraints are high, players have to anticipate the opponent action. Not much is known about how players acquire and improve this ability. The aim of this study was to use an implicit training protocol (no information was given to participants) based on the knowledge of one particular opponent to analyse how experts could improve their anticipation ability.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!