The goal of talking face generation is to synthesize a sequence of face images of the specified identity, ensuring the mouth movements are synchronized with the given audio. Recently, image-based talking face generation has emerged as a popular approach. It could generate talking face images synchronized with the audio merely depending on a facial image of arbitrary identity and an audio clip. Despite the accessible input, it forgoes the exploitation of the audio emotion, inducing the generated faces to suffer from emotion unsynchronization, mouth inaccuracy, and image quality deficiency. In this article, we build a bistage audio emotion-aware talking face generation (AMIGO) framework, to generate high-quality talking face videos with cross-modally synced emotion. Specifically, we propose a sequence-to-sequence (seq2seq) cross-modal emotional landmark generation network to generate vivid landmarks, whose lip and emotion are both synchronized with input audio. Meantime, we utilize a coordinated visual emotion representation to improve the extraction of the audio one. In stage two, a feature-adaptive visual translation network is designed to translate the synthesized landmarks into facial images. Concretely, we proposed a feature-adaptive transformation module to fuse the high-level representations of landmarks and images, resulting in significant improvement in image quality. We perform extensive experiments on the multi-view emotional audio-visual dataset (MEAD) and crowd-sourced emotional multimodal actors dataset (CREMA-D) benchmark datasets, demonstrating that our model outperforms state-of-the-art benchmarks.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2023.3274676 | DOI Listing |
BMJ Open
December 2024
Clinical Sciences, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
Introduction: Infants born very preterm (VPT, <32 weeks' gestation) are at increased risk for neurodevelopmental impairments including motor, cognitive and behavioural delay. Parents of infants born VPT also have poorer mental health outcomes compared with parents of infants born at term.We have developed an intervention programme called TEDI-Prem (Telehealth for Early Developmental Intervention in babies born very preterm) based on previous research.
View Article and Find Full Text PDFBMC Health Serv Res
January 2025
Department of Speech and Language Pathology, School of Rehabilitation Sciences, Hamadan University of Medical Sciences, Hamadan, Iran.
Introduction: Communication disorders are one of the most common disorders that, if not treated in childhood, can cause many social, educational, and psychological problems in adulthood. One of the technologies that can be helpful in these disorders is mobile health (m-Health) technology. This study aims to examine the attitude and willingness to use this technology and compare the advantages and challenges of this technology and face-to-face treatment from the perspective of patients.
View Article and Find Full Text PDFHeliyon
January 2025
Department of Clinical Psychology, School of Behavioral Sciences and Mental Health (Tehran Institute of Psychiatry), Iran University of Mental Science, Tehran, Iran.
Background: Autistic children often face difficulties with semantic skills such as receptive lexicon. Games based on behavioral principles have been emphasized for treating autistic children. Serious Games are a new and effective way to alleviate deficits in autistic children.
View Article and Find Full Text PDFNeural Netw
January 2025
School of Automation Science and Engineering, South China University of Technology, China. Electronic address:
Talking face generation is a promising approach within various domains, such as digital assistants, video editing, and virtual video conferences. Previous works with audio-driven talking faces focused primarily on the synchronization between audio and video. However, existing methods still have certain limitations in synthesizing photo-realistic video with high identity preservation, audiovisual synchronization, and facial details like blink movements.
View Article and Find Full Text PDFJ Am Med Dir Assoc
January 2025
Department of Neurology, Renaissance School of Medicine, Stony Brook, NY, United States.
Objectives: Early research reported that older adults who stopped walking when they began a conversation were more likely to fall in the future. As a systematic measure of dual-task performance, Verghese and colleagues developed the Walking While Talking (WWT) test, in which a person walks at a normal pace while reciting alternate letters of the alphabet. The present paper highlights key findings from the 2 decades of research using the WWT test.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!