The goal of talking face generation is to synthesize a sequence of face images of the specified identity, ensuring the mouth movements are synchronized with the given audio. Recently, image-based talking face generation has emerged as a popular approach. It could generate talking face images synchronized with the audio merely depending on a facial image of arbitrary identity and an audio clip. Despite the accessible input, it forgoes the exploitation of the audio emotion, inducing the generated faces to suffer from emotion unsynchronization, mouth inaccuracy, and image quality deficiency. In this article, we build a bistage audio emotion-aware talking face generation (AMIGO) framework, to generate high-quality talking face videos with cross-modally synced emotion. Specifically, we propose a sequence-to-sequence (seq2seq) cross-modal emotional landmark generation network to generate vivid landmarks, whose lip and emotion are both synchronized with input audio. Meantime, we utilize a coordinated visual emotion representation to improve the extraction of the audio one. In stage two, a feature-adaptive visual translation network is designed to translate the synthesized landmarks into facial images. Concretely, we proposed a feature-adaptive transformation module to fuse the high-level representations of landmarks and images, resulting in significant improvement in image quality. We perform extensive experiments on the multi-view emotional audio-visual dataset (MEAD) and crowd-sourced emotional multimodal actors dataset (CREMA-D) benchmark datasets, demonstrating that our model outperforms state-of-the-art benchmarks.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2023.3274676DOI Listing

Publication Analysis

Top Keywords

talking face
24
face generation
16
face images
8
synchronized audio
8
image quality
8
audio
7
talking
6
face
6
generation
5
emotion
5

Similar Publications

Introduction: Infants born very preterm (VPT, <32 weeks' gestation) are at increased risk for neurodevelopmental impairments including motor, cognitive and behavioural delay. Parents of infants born VPT also have poorer mental health outcomes compared with parents of infants born at term.We have developed an intervention programme called TEDI-Prem (Telehealth for Early Developmental Intervention in babies born very preterm) based on previous research.

View Article and Find Full Text PDF

Introduction: Communication disorders are one of the most common disorders that, if not treated in childhood, can cause many social, educational, and psychological problems in adulthood. One of the technologies that can be helpful in these disorders is mobile health (m-Health) technology. This study aims to examine the attitude and willingness to use this technology and compare the advantages and challenges of this technology and face-to-face treatment from the perspective of patients.

View Article and Find Full Text PDF

Evaluation of the effectiveness of a serious game titled "Kookism" on the receptive lexicon in 4-9-year-old autistic children.

Heliyon

January 2025

Department of Clinical Psychology, School of Behavioral Sciences and Mental Health (Tehran Institute of Psychiatry), Iran University of Mental Science, Tehran, Iran.

Background: Autistic children often face difficulties with semantic skills such as receptive lexicon. Games based on behavioral principles have been emphasized for treating autistic children. Serious Games are a new and effective way to alleviate deficits in autistic children.

View Article and Find Full Text PDF

VPT: Video portraits transformer for realistic talking face generation.

Neural Netw

January 2025

School of Automation Science and Engineering, South China University of Technology, China. Electronic address:

Talking face generation is a promising approach within various domains, such as digital assistants, video editing, and virtual video conferences. Previous works with audio-driven talking faces focused primarily on the synchronization between audio and video. However, existing methods still have certain limitations in synthesizing photo-realistic video with high identity preservation, audiovisual synchronization, and facial details like blink movements.

View Article and Find Full Text PDF

Two Decades of the Walking While Talking Test: A Narrative Review.

J Am Med Dir Assoc

January 2025

Department of Neurology, Renaissance School of Medicine, Stony Brook, NY, United States.

Objectives: Early research reported that older adults who stopped walking when they began a conversation were more likely to fall in the future. As a systematic measure of dual-task performance, Verghese and colleagues developed the Walking While Talking (WWT) test, in which a person walks at a normal pace while reciting alternate letters of the alphabet. The present paper highlights key findings from the 2 decades of research using the WWT test.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!