VPT: Video portraits transformer for realistic talking face generation.

Neural Netw

School of Automation Science and Engineering, South China University of Technology, China. Electronic address:

Published: January 2025

Talking face generation is a promising approach within various domains, such as digital assistants, video editing, and virtual video conferences. Previous works with audio-driven talking faces focused primarily on the synchronization between audio and video. However, existing methods still have certain limitations in synthesizing photo-realistic video with high identity preservation, audiovisual synchronization, and facial details like blink movements. To solve these problems, a novel talking face generation framework, termed video portraits transformer (VPT) with controllable blink movements is proposed and applied. It separates the process of video generation into two stages, i.e., audio-to-landmark and landmark-to-face stages. In the audio-to-landmark stage, the transformer encoder serves as the generator used for predicting whole facial landmarks from given audio and continuous eye aspect ratio (EAR). During the landmark-to-face stage, the video-to-video (vid-to-vid) network is employed to transfer landmarks into realistic talking face videos. Moreover, to imitate real blink movements during inference, a transformer-based spontaneous blink generation module is devised to generate the EAR sequence. Extensive experiments demonstrate that the VPT method can produce photo-realistic videos of talking faces with natural blink movements, and the spontaneous blink generation module can generate blink movements close to the real blink duration distribution and frequency.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.neunet.2025.107122	DOI Listing

Publication Analysis

Top Keywords

blink movements

talking face

face generation

video portraits

portraits transformer

realistic talking

talking faces

blink

stages audio-to-landmark

real blink

Similar Publications

[Locked-in-syndrome].

Zh Nevrol Psikhiatr Im S S Korsakova

January 2025

Pirogov Russian National Research Medical University (Pirogov University), Moscow, Russia.

L T Khasanova D Kh Okhtova K N Arkhagova I A Lomakina E A Koltsova

Locked-in syndrome is a rare neurological disorder. It is characterized by tetraparesis, paralysis of facial and masticatory muscles, anarthria and pseudobulbar syndrome with possible preservation of vertical movements of the eyeballs and blinking, as well as preservation of consciousness. A serious problem with the «locked-in person» syndrome is the inability of the patient to socialize, which causes him to experience no less suffering than from physical limitations.

View Article and Find Full Text PDF

Similar Publications

Eye metrics are a marker of visual conscious awareness and neural processing in cerebral blindness.

bioRxiv

January 2025

Laboratory of Brain and Cognition (LBC), National Institute of Mental Health (NIMH), National Institutes of Health (NIH), Bethesda, Maryland (MD), USA.

Sharif I Kronemer Victoria E Gobo Shruti Japee Eli Merriam Benjamin Osborne

Damage to the primary visual pathway can cause vision loss. Some cerebrally blind people retain degraded vision or sensations and can perform visually guided behaviors. These cases motivate investigation and debate on blind field conscious awareness and linked residual neural processing.

View Article and Find Full Text PDF

Similar Publications

Fast, bioluminescent blinks attract group members of the nocturnal flashlight fish Anomalops katoptron (Bleeker, 1856).

Front Zool

January 2025

Department of General Zoology and Neurobiology, Institute of Biology and Biotechnology, Ruhr-University Bochum, 44801, Bochum, Germany.

Peter Jägers Stefan Herlitze

Background: During their nighttime shoaling, the flashlight fish Anomalops katoptron produce fascinating, bioluminescent blink patterns, which have been related to the localization of food, determination of nearest neighbor distance, and initiation of the shoal's movement direction. Information transfer e.g.

View Article and Find Full Text PDF

Similar Publications

VPT: Video portraits transformer for realistic talking face generation.

Neural Netw

January 2025

School of Automation Science and Engineering, South China University of Technology, China. Electronic address:

Zhijun Zhang Jian Zhang Weijian Mai

View Article and Find Full Text PDF

Similar Publications

Explicit metrics for implicit emotions: investigating physiological and gaze indices of learner emotions.

Front Psychol

December 2024

Departent of Learning, Data-Analytics and Technology, Faculty of Behavioural, Management and Social Sciences, University of Twente, Enschede, Netherlands.

Sharanya Lal Tessa H S Eysink Hannie A Gijlers Bernard P Veldkamp Johannes Steinrücke

Learning experiences are intertwined with emotions, which in turn have a significant effect on learning outcomes. Therefore, digital learning environments can benefit from taking the emotional state of the learner into account. To do so, the first step is real-time emotion detection which is made possible by sensors that can continuously collect physiological and eye-tracking data.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!