Training state-of-the-art models for human pose estimation in videos requires datasets with annotations that are really hard and expensive to obtain. Although transformers have been recently utilized for body pose sequence modeling, related methods rely on pseudo-ground truth to augment the currently limited training data available for learning such models. In this paper, we introduce PoseBERT, a transformer module that is fully trained on 3D Motion Capture (MoCap) data via masked modeling. It is simple, generic and versatile, as it can be plugged on top of any image-based model to transform it in a video-based model leveraging temporal information. We showcase variants of PoseBERT with different inputs varying from 3D skeleton keypoints to rotations of a 3D parametric model for either the full body (SMPL) or just the hands (MANO). Since PoseBERT training is task agnostic, the model can be applied to several tasks such as pose refinement, future pose prediction or motion completion without finetuning. Our experimental results validate that adding PoseBERT on top of various state-of-the-art pose estimation methods consistently improves their performances, while its low computational cost allows us to use it in a real-time demo for smoothly animating a robotic hand via a webcam. Test code and models are available at https://github.com/naver/posebert.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2022.3216899 | DOI Listing |
Med Biol Eng Comput
January 2025
College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China.
Accurately and swiftly segmenting breast tumors is significant for cancer diagnosis and treatment. Ultrasound imaging stands as one of the widely employed methods in clinical practice. However, due to challenges such as low contrast, blurred boundaries, and prevalent shadows in ultrasound images, tumor segmentation remains a daunting task.
View Article and Find Full Text PDFEur J Nucl Med Mol Imaging
January 2025
Department of Nuclear Medicine, West China Hospital, Sichuan University, No.37, Guoxue Alley, Chengdu City, Sichuan Province, 610041, China.
Background: Pathological grade is a critical determinant of clinical outcomes and decision-making of follicular lymphoma (FL). This study aimed to develop a deep learning model as a digital biopsy for the non-invasive identification of FL grade.
Methods: This study retrospectively included 513 FL patients from five independent hospital centers, randomly divided into training, internal validation, and external validation cohorts.
Comput Biol Med
January 2025
State Key Laboratory of Oral Diseases and National Clinical Research Center for Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, 610041, China. Electronic address:
Medical image segmentation is pivotal in disease diagnosis and treatment. This paper presents a novel network architecture for medical image segmentation, termed TransDLNet, which is engineered to enhance the efficiency of multi-scale information utilization. TransDLNet integrates convolutional neural networks and Transformers, facilitating cross-level multi-scale information fusion for complex medical images.
View Article and Find Full Text PDFNeural Netw
January 2025
Department of Mechanical Engineering, Politecnico di Milano, Milan, Italy.
Shadow removal remains a challenging visual task aimed at restoring the original brightness of shadow regions in images. Many existing methods overlook the implicit clues within non-shadow regions, leading to inconsistencies in the color, texture, and illumination of the reconstructed shadow-free images. To address these issues, we propose an efficient hybrid model of Transformer and Generative Adversarial Network (GAN), named ShadowGAN-Former, which utilizes information from non-shadow regions to assist in shadow removal.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Computer Science, Xi'an University of Architecture and Technology, Xi'an, 710055, Shaanxi Province, China.
The attention mechanism has significantly progressed in various point cloud tasks. Benefiting from its significant competence in capturing long-range dependencies, research in point cloud completion has achieved promising results. However, the typically disordered point cloud data features complicated non-Euclidean geometric structures and exhibits unpredictable behavior.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!