Real-Time Multi-Person Video Synthesis with Controllable Prior-Guided Matting.

Sensors (Basel)

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China.

Published: April 2024

In order to enhance the matting performance in multi-person dynamic scenarios, we introduce a robust, real-time, high-resolution, and controllable human video matting method that achieves state of the art on all metrics. Unlike most existing methods that perform video matting frame by frame as independent images, we design a unified architecture using a controllable generation model to solve the problem of the lack of overall semantic information in multi-person video. Our method, called ControlMatting, uses an independent recurrent architecture to exploit temporal information in videos and achieves significant improvements in temporal coherence and detailed matting quality. ControlMatting adopts a mixed training strategy comprised of matting and a semantic segmentation dataset, which effectively improves the semantic understanding ability of the model. Furthermore, we propose a novel deep learning-based image filter algorithm that enforces our detailed augmentation ability on both matting and segmentation objectives. Our experiments have proved that prior information about the human body from the image itself can effectively combat the defect masking problem caused by complex dynamic scenarios with multiple people.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11086136PMC
http://dx.doi.org/10.3390/s24092795DOI Listing

Publication Analysis

Top Keywords

multi-person video
8
dynamic scenarios
8
video matting
8
matting
7
real-time multi-person
4
video
4
video synthesis
4
synthesis controllable
4
controllable prior-guided
4
prior-guided matting
4

Similar Publications

Multi-view multi-human association and tracking (MvMHAT), is an emerging yet important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only considering the over-time human tracking. This way, the videos for MvMHAT require more complex annotations while containing more information for self-learning. In this work, we tackle this problem with an end-to-end neural network in a self-supervised learning manner.

View Article and Find Full Text PDF

Video-based automatic hand hygiene detection for operating rooms using 3D convolutional neural networks.

J Clin Monit Comput

October 2024

Department of Convergence Medicine, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, 05505, Republic of Korea.

Hand hygiene among anesthesia personnel is important to prevent hospital-acquired infections in operating rooms; however, an efficient monitoring system remains elusive. In this study, we leverage a deep learning approach based on operating room videos to detect alcohol-based hand hygiene actions of anesthesia providers. Videos were collected over a period of four months from November, 2018 to February, 2019, at a single operating room.

View Article and Find Full Text PDF

In order to enhance the matting performance in multi-person dynamic scenarios, we introduce a robust, real-time, high-resolution, and controllable human video matting method that achieves state of the art on all metrics. Unlike most existing methods that perform video matting frame by frame as independent images, we design a unified architecture using a controllable generation model to solve the problem of the lack of overall semantic information in multi-person video. Our method, called ControlMatting, uses an independent recurrent architecture to exploit temporal information in videos and achieves significant improvements in temporal coherence and detailed matting quality.

View Article and Find Full Text PDF

A Systematic Review of Recent Deep Learning Approaches for 3D Human Pose Estimation.

J Imaging

December 2023

Alqualsadi Research Team, Rabat IT Center, ENSIAS, Mohammed V University in Rabat, Rabat 10112, Morocco.

Three-dimensional human pose estimation has made significant advancements through the integration of deep learning techniques. This survey provides a comprehensive review of recent 3D human pose estimation methods, with a focus on monocular images, videos, and multi-view cameras. Our approach stands out through a systematic literature review methodology, ensuring an up-to-date and meticulous overview.

View Article and Find Full Text PDF

Synchronization of neural activity across brains - Interpersonal Neural Synchrony (INS) - is emerging as a powerful marker of social interaction that predicts success of multi-person coordination, communication, and cooperation. As the origins of INS are poorly understood, we tested whether and how INS might emerge from spontaneous dyadic behavior. We recorded neural activity (EEG) and human behavior (full-body kinematics, eye movements, and facial expressions) while dyads of participants were instructed to look at each other without speaking or making co-verbal gestures.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!