Estimating 3-D hand pose estimation from a single depth image is important for human-computer interaction. Although depth-based 3-D hand pose estimation has made great progress in recent years, it is still difficult to deal with some complex scenes, especially the issues of serious self-occlusion and high self-similarity of fingers. Inspired by the fact that multipart context is critical to alleviate ambiguity, and constraint relations contained in the hand structure are important for the robust estimation, we attempt to explicitly model the correlations between different hand parts. In this article, we propose a pose-guided hierarchical graph convolution (PHG) module, which is embedded into the pixelwise regression framework to enhance the convolutional feature maps by exploring the complex dependencies between different hand parts. Specifically, the PHG module first extracts hierarchical fine-grained node features under the guidance of hand pose and then uses graph convolution to perform hierarchical message passing between nodes according to the hand structure. Finally, the enhanced node features are used to generate dynamic convolution kernels to generate hierarchical structure-aware feature maps. Our method achieves state-of-the-art performance or comparable performance with the state-of-the-art methods on five 3-D hand pose datasets: 1) HANDS 2019; 2) HANDS 2017; 3) NYU; 4) ICVL; and 5) MSRA.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TCYB.2021.3083637 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!