Talking face generation is a promising approach within various domains, such as digital assistants, video editing, and virtual video conferences. Previous works with audio-driven talking faces focused primarily on the synchronization between audio and video. However, existing methods still have certain limitations in synthesizing photo-realistic video with high identity preservation, audiovisual synchronization, and facial details like blink movements. To solve these problems, a novel talking face generation framework, termed video portraits transformer (VPT) with controllable blink movements is proposed and applied. It separates the process of video generation into two stages, i.e., audio-to-landmark and landmark-to-face stages. In the audio-to-landmark stage, the transformer encoder serves as the generator used for predicting whole facial landmarks from given audio and continuous eye aspect ratio (EAR). During the landmark-to-face stage, the video-to-video (vid-to-vid) network is employed to transfer landmarks into realistic talking face videos. Moreover, to imitate real blink movements during inference, a transformer-based spontaneous blink generation module is devised to generate the EAR sequence. Extensive experiments demonstrate that the VPT method can produce photo-realistic videos of talking faces with natural blink movements, and the spontaneous blink generation module can generate blink movements close to the real blink duration distribution and frequency.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2025.107122 | DOI Listing |
Zh Nevrol Psikhiatr Im S S Korsakova
January 2025
Pirogov Russian National Research Medical University (Pirogov University), Moscow, Russia.
Locked-in syndrome is a rare neurological disorder. It is characterized by tetraparesis, paralysis of facial and masticatory muscles, anarthria and pseudobulbar syndrome with possible preservation of vertical movements of the eyeballs and blinking, as well as preservation of consciousness. A serious problem with the «locked-in person» syndrome is the inability of the patient to socialize, which causes him to experience no less suffering than from physical limitations.
View Article and Find Full Text PDFbioRxiv
January 2025
Laboratory of Brain and Cognition (LBC), National Institute of Mental Health (NIMH), National Institutes of Health (NIH), Bethesda, Maryland (MD), USA.
Damage to the primary visual pathway can cause vision loss. Some cerebrally blind people retain degraded vision or sensations and can perform visually guided behaviors. These cases motivate investigation and debate on blind field conscious awareness and linked residual neural processing.
View Article and Find Full Text PDFFront Zool
January 2025
Department of General Zoology and Neurobiology, Institute of Biology and Biotechnology, Ruhr-University Bochum, 44801, Bochum, Germany.
Background: During their nighttime shoaling, the flashlight fish Anomalops katoptron produce fascinating, bioluminescent blink patterns, which have been related to the localization of food, determination of nearest neighbor distance, and initiation of the shoal's movement direction. Information transfer e.g.
View Article and Find Full Text PDFNeural Netw
January 2025
School of Automation Science and Engineering, South China University of Technology, China. Electronic address:
Talking face generation is a promising approach within various domains, such as digital assistants, video editing, and virtual video conferences. Previous works with audio-driven talking faces focused primarily on the synchronization between audio and video. However, existing methods still have certain limitations in synthesizing photo-realistic video with high identity preservation, audiovisual synchronization, and facial details like blink movements.
View Article and Find Full Text PDFFront Psychol
December 2024
Departent of Learning, Data-Analytics and Technology, Faculty of Behavioural, Management and Social Sciences, University of Twente, Enschede, Netherlands.
Learning experiences are intertwined with emotions, which in turn have a significant effect on learning outcomes. Therefore, digital learning environments can benefit from taking the emotional state of the learner into account. To do so, the first step is real-time emotion detection which is made possible by sensors that can continuously collect physiological and eye-tracking data.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!