Together Recognizing, Localizing and Summarizing Actions in Egocentric Videos.

IEEE Trans Image Process

Published: April 2021

Analysis of egocentric video has recently drawn attention of researchers in the computer vision as well as multimedia communities. In this paper, we propose a weakly supervised superpixel level joint framework for localization, recognition and summarization of actions in an egocentric video. We first recognize and localize single as well as multiple action(s) in each frame of an egocentric video and then construct a summary of these detected actions. The superpixel level solution helps in precise localization of actions in addition to improving the recognition accuracy. Superpixels are extracted within the central regions of the egocentric video frames; these central regions being determined through a previously developed center-surround model. A sparse spatio-temporal video representation graph is constructed in the deep feature space with the superpixels as nodes. A weakly supervised solution using random walks yields action labels for each superpixel. After determining action label(s) for each frame from its constituent superpixels, we apply a fractional knapsack type formulation for obtaining a summary (of actions). Experimental comparisons on publicly available ADL, GTEA, EGTEA Gaze+, EgoGesture, and EPIC-Kitchens datasets show the effectiveness of the proposed solution.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TIP.2021.3070732	DOI Listing

Publication Analysis

Top Keywords

egocentric video

actions egocentric

weakly supervised

superpixel level

central regions

action labels

actions

egocentric

video

recognizing localizing

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!