Training state-of-the-art models for human pose estimation in videos requires datasets with annotations that are really hard and expensive to obtain. Although transformers have been recently utilized for body pose sequence modeling, related methods rely on pseudo-ground truth to augment the currently limited training data available for learning such models. In this paper, we introduce PoseBERT, a transformer module that is fully trained on 3D Motion Capture (MoCap) data via masked modeling. It is simple, generic and versatile, as it can be plugged on top of any image-based model to transform it in a video-based model leveraging temporal information. We showcase variants of PoseBERT with different inputs varying from 3D skeleton keypoints to rotations of a 3D parametric model for either the full body (SMPL) or just the hands (MANO). Since PoseBERT training is task agnostic, the model can be applied to several tasks such as pose refinement, future pose prediction or motion completion without finetuning. Our experimental results validate that adding PoseBERT on top of various state-of-the-art pose estimation methods consistently improves their performances, while its low computational cost allows us to use it in a real-time demo for smoothly animating a robotic hand via a webcam. Test code and models are available at https://github.com/naver/posebert.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2022.3216899DOI Listing

Publication Analysis

Top Keywords

transformer module
8
pose estimation
8
posebert
5
pose
5
posebert generic
4
generic transformer
4
module temporal
4
temporal human
4
human modeling
4
modeling training
4

Similar Publications

BCT-Net: semantic-guided breast cancer segmentation on BUS.

Med Biol Eng Comput

January 2025

College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China.

Accurately and swiftly segmenting breast tumors is significant for cancer diagnosis and treatment. Ultrasound imaging stands as one of the widely employed methods in clinical practice. However, due to challenges such as low contrast, blurred boundaries, and prevalent shadows in ultrasound images, tumor segmentation remains a daunting task.

View Article and Find Full Text PDF

An explainable transformer model integrating PET and tabular data for histologic grading and prognosis of follicular lymphoma: a multi-institutional digital biopsy study.

Eur J Nucl Med Mol Imaging

January 2025

Department of Nuclear Medicine, West China Hospital, Sichuan University, No.37, Guoxue Alley, Chengdu City, Sichuan Province, 610041, China.

Background: Pathological grade is a critical determinant of clinical outcomes and decision-making of follicular lymphoma (FL). This study aimed to develop a deep learning model as a digital biopsy for the non-invasive identification of FL grade.

Methods: This study retrospectively included 513 FL patients from five independent hospital centers, randomly divided into training, internal validation, and external validation cohorts.

View Article and Find Full Text PDF

A multi-scale information fusion medical image segmentation network based on convolutional kernel coupled updata mechanism.

Comput Biol Med

January 2025

State Key Laboratory of Oral Diseases and National Clinical Research Center for Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, 610041, China. Electronic address:

Medical image segmentation is pivotal in disease diagnosis and treatment. This paper presents a novel network architecture for medical image segmentation, termed TransDLNet, which is engineered to enhance the efficiency of multi-scale information utilization. TransDLNet integrates convolutional neural networks and Transformers, facilitating cross-level multi-scale information fusion for complex medical images.

View Article and Find Full Text PDF

ShadowGAN-Former: Reweighting self-attention based on mask for shadow removal.

Neural Netw

January 2025

Department of Mechanical Engineering, Politecnico di Milano, Milan, Italy.

Shadow removal remains a challenging visual task aimed at restoring the original brightness of shadow regions in images. Many existing methods overlook the implicit clues within non-shadow regions, leading to inconsistencies in the color, texture, and illumination of the reconstructed shadow-free images. To address these issues, we propose an efficient hybrid model of Transformer and Generative Adversarial Network (GAN), named ShadowGAN-Former, which utilizes information from non-shadow regions to assist in shadow removal.

View Article and Find Full Text PDF

The attention mechanism has significantly progressed in various point cloud tasks. Benefiting from its significant competence in capturing long-range dependencies, research in point cloud completion has achieved promising results. However, the typically disordered point cloud data features complicated non-Euclidean geometric structures and exhibits unpredictable behavior.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!