Accurate human action recognition is becoming increasingly important across various fields, including healthcare and self-driving cars. A simple approach to enhance model performance is incorporating additional data modalities, such as depth frames, point clouds, and skeleton information, while previous studies have predominantly used late fusion techniques to combine these modalities, our research introduces a multi-level fusion approach that combines information at early, intermediate, and late stages together. Furthermore, recognizing the challenges of collecting multiple data types in real-world applications, our approach seeks to exploit multimodal techniques while relying solely on RGB frames as the single data source.
View Article and Find Full Text PDFTemporal action proposal generation is a method for extracting temporal action instances or proposals from untrimmed videos. Existing methods often struggle to segment contiguous action proposals, which are a group of action boundaries with small temporal gaps. To address this limitation, we propose incorporating an attention mechanism to weigh the importance of each proposal within a contiguous group.
View Article and Find Full Text PDFTemporal-action proposal generation (TAPG) is a well-known pre-processing of temporal-action localization and mainly affects localization performance on untrimmed videos. In recent years, there has been growing interest in proposal generation. Researchers have recently focused on anchor- and boundary-based methods for generating action proposals.
View Article and Find Full Text PDF