Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. According to the nature of audio to lip motions mapping, the same speech content may have different appearances even for the same person at different occasions. Such one-to-many mapping problem brings ambiguity during training and thus causes inferior visual results. Although this one-to-many mapping could be alleviated in part by a two-stage framework (i.e., an audio-to-expression model followed by a neural-rendering model), it is still insufficient since the prediction is produced without enough information (e.g., emotions, wrinkles, etc.). In this paper, we propose MemFace to complement the missing information with an implicit memory and an explicit memory that follow the sense of the two stages respectively. More specifically, the implicit memory is employed in the audio-to-expression model to capture high-level semantics in the audio-expression shared space, while the explicit memory is employed in the neural-rendering model to help synthesize pixel-level details. Our experimental results show that our proposed MemFace surpasses all the state-of-the-art results across multiple scenarios consistently and significantly.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2024.3409380DOI Listing

Publication Analysis

Top Keywords

one-to-many mapping
12
talking face
8
face generation
8
audio-to-expression model
8
neural-rendering model
8
implicit memory
8
explicit memory
8
memory employed
8
memories one-to-many
4
mapping
4

Similar Publications

Plasmonic nanoantennas with suitable far-field characteristics are of huge interest for utilization in optical wireless links, inter-/intrachip communications, LiDARs, and photonic integrated circuits due to their exceptional modal confinement. Despite its success in shaping robust antenna design theories in radio frequency and millimeter-wave regimes, conventional transmission line theory finds its validity diminished in the optical frequencies, leading to a noticeable void in a generalized theory for antenna design in the optical domain. By utilizing neural networks, and through a one-time training of the network, one can transform the plasmonic nanoantennas design into an automated, data-driven task.

View Article and Find Full Text PDF

Unified translation of medical images from one-to-many distinct modalities is desirable in healthcare settings. A ubiquitous approach for bilateral medical scan translation is one-to-one mapping with GANs. However, its efficacy in encapsulating diversity in a pool of medical scans and performing one-to-many translation is questionable.

View Article and Find Full Text PDF

Incremental Inverse Design of Desired Soybean Phenotypes.

ACS Omega

October 2024

Department of Chemical and Biomolecular Engineering, University of Connecticut, Storrs, Connecticut 06269, United States.

We present an application of computational inverse design, which reverses the conventional trial-and-error forward design paradigm, optimizes biological phenotype by directly modifying genotype. The limitations of inverse design in genotype-to-bulk phenotype (G-BP) mapping can be addressed via an established design paradigm: "design, build, test, learn" (DBTL), where computational inverse design automates both the design and learn phases. In any context, inverse design is limited by the fundamental "one-to-many" nature of the inverse function.

View Article and Find Full Text PDF

Frogs have a highly conserved body plan, yet they employ a diverse array of locomotor modes, making them ideal organisms for investigating the relationships between morphology and locomotor function, in particular whether anatomical complexity is a prerequisite for functional complexity. We use diffusible iodine contrast-enhanced microCT (diceCT) imaging to digitally dissect the gross muscle anatomy of the pelvis and hindlimbs for 30 species of frogs representing five primary locomotor modes, including the first known detailed dissection for some of the world's smallest frogs, forming the largest digital comparative analysis of musculoskeletal structure in any vertebrate clade to date. By linking musculoskeletal dissections and phylogenetic comparative methods, we then quantify and compare relationships between anatomy and function across over 160 million years of anuran evolution.

View Article and Find Full Text PDF

Deductive solution strategies are required in prediction scenarios that are under determined, when contradictory information is available, or more generally wherever one-to-many non-functional mappings occur. In contrast, most contemporary machine learning (ML) in the chemical sciences is inductive learning from example, with a fixed set of features. Chemical workflows are replete with situations requiring deduction, including many aspects of lab automation and spectral interpretation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!