Action Quality Assessment (AQA) plays an important role in video analysis, which is applied to evaluate the quality of specific actions, i.e., sports activities. However, it is still challenging because there are lots of small action discrepancies with similar backgrounds, but current approaches mostly adopt holistic video representations. So that fine-grained intra-class variations are unable to be captured. To address the aforementioned challenge, we propose a Fine-grained Spatio-temporal Parsing Network (FSPN) which is composed of the intra-sequence action parsing module and spatiotemporal multiscale transformer module to learn fine-grained spatiotemporal sub-action representations for more reliable AQA. The intra-sequence action parsing module performs semantical sub-action parsing by mining sub-actions at fine-grained levels. It enables a correct description of the subtle differences between action sequences. The spatiotemporal multiscale transformer module learns motion-oriented action features and obtains their long-range dependencies among sub-actions at different scales. Furthermore, we design a group contrastive loss to train the model and learn more discriminative feature representations for sub-actions without explicit supervision. We exhaustively evaluate our proposed approach in the FineDiving, AQA-7, and MTL-AQA datasets. Extensive experiment results demonstrate the effectiveness and feasibility of our proposed approach, which outperforms the state-of-the-art methods by a significant margin.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TIP.2023.3331212 | DOI Listing |
Sensors (Basel)
November 2024
College of Intelligent Transportation, Chongqing Vocational College of Public Transportation, Chongqing 402260, China.
Aiming at the problem that the existing human skeleton behavior recognition methods are insensitive to human local movements and show inaccurate recognition in distinguishing similar behaviors, a multi-scale spatio-temporal graph convolution method incorporating multi-granularity features is proposed for human behavior recognition. Firstly, a skeleton fine-grained partitioning strategy is proposed, which initializes the skeleton data into data streams of different granularities. An adaptive cross-scale feature fusion layer is designed using a normalized Gaussian function to perform feature fusion among different granularities, guiding the model to focus on discriminative feature representations among similar behaviors through fine-grained features.
View Article and Find Full Text PDFBiomimetics (Basel)
November 2024
The Academy for Engineering and Technology, Fudan University, Shanghai 200433, China.
Humans typically make decisions based on past experiences and observations, while in the field of robotic manipulation, the robot's action prediction often relies solely on current observations, which tends to make robots overlook environmental changes or become ineffective when current observations are suboptimal. To address this pivotal challenge in robotics, inspired by human cognitive processes, we propose our method which integrates historical learning and multi-view attention to improve the performance of robotic manipulation. Based on a spatio-temporal attention mechanism, our method not only combines observations from current and past steps but also integrates historical actions to better perceive changes in robots' behaviours and their impacts on the environment.
View Article and Find Full Text PDFEnviron Monit Assess
November 2024
School of Electronic and Electrical Engineering, Wuhan Textile University, Wuhan, 430200, China.
The concentration of PM2.5 is one of the air quality indicators that the public pays the most attention to. Existing methods for PM2.
View Article and Find Full Text PDFEnviron Int
November 2024
Research Division, California Air Resources Board, Sacramento, CA 95812, the United States of America.
California's diverse geography and meteorological conditions necessitate models capturing fine-grained patterns of air pollution distribution. This study presents the development of high-resolution (100 m) daily land use regression (LUR) models spanning 1989-2021 for nitrogen dioxide (NO), fine particulate matter (PM), and ozone (O) across California. These machine learning LUR algorithms integrated comprehensive data sources, including traffic, land use, land cover, meteorological conditions, vegetation dynamics, and satellite data.
View Article and Find Full Text PDFNeural Netw
January 2025
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!