Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. According to the nature of audio to lip motions mapping, the same speech content may have different appearances even for the same person at different occasions. Such one-to-many mapping problem brings ambiguity during training and thus causes inferior visual results. Although this one-to-many mapping could be alleviated in part by a two-stage framework (i.e., an audio-to-expression model followed by a neural-rendering model), it is still insufficient since the prediction is produced without enough information (e.g., emotions, wrinkles, etc.). In this paper, we propose MemFace to complement the missing information with an implicit memory and an explicit memory that follow the sense of the two stages respectively. More specifically, the implicit memory is employed in the audio-to-expression model to capture high-level semantics in the audio-expression shared space, while the explicit memory is employed in the neural-rendering model to help synthesize pixel-level details. Our experimental results show that our proposed MemFace surpasses all the state-of-the-art results across multiple scenarios consistently and significantly.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2024.3409380 | DOI Listing |
Nanophotonics
September 2024
Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093, USA.
Plasmonic nanoantennas with suitable far-field characteristics are of huge interest for utilization in optical wireless links, inter-/intrachip communications, LiDARs, and photonic integrated circuits due to their exceptional modal confinement. Despite its success in shaping robust antenna design theories in radio frequency and millimeter-wave regimes, conventional transmission line theory finds its validity diminished in the optical frequencies, leading to a noticeable void in a generalized theory for antenna design in the optical domain. By utilizing neural networks, and through a one-time training of the network, one can transform the plasmonic nanoantennas design into an automated, data-driven task.
View Article and Find Full Text PDFComput Biol Med
December 2024
Department of Chemical Engineering, IIT Delhi, India; Yardi School of Artificial Intelligence, IIT Delhi, India. Electronic address:
Unified translation of medical images from one-to-many distinct modalities is desirable in healthcare settings. A ubiquitous approach for bilateral medical scan translation is one-to-one mapping with GANs. However, its efficacy in encapsulating diversity in a pool of medical scans and performing one-to-many translation is questionable.
View Article and Find Full Text PDFACS Omega
October 2024
Department of Chemical and Biomolecular Engineering, University of Connecticut, Storrs, Connecticut 06269, United States.
We present an application of computational inverse design, which reverses the conventional trial-and-error forward design paradigm, optimizes biological phenotype by directly modifying genotype. The limitations of inverse design in genotype-to-bulk phenotype (G-BP) mapping can be addressed via an established design paradigm: "design, build, test, learn" (DBTL), where computational inverse design automates both the design and learn phases. In any context, inverse design is limited by the fundamental "one-to-many" nature of the inverse function.
View Article and Find Full Text PDFJ Anat
November 2024
Centre for Integrative Anatomy, Cell and Developmental Biology, University College London, London, UK.
Frogs have a highly conserved body plan, yet they employ a diverse array of locomotor modes, making them ideal organisms for investigating the relationships between morphology and locomotor function, in particular whether anatomical complexity is a prerequisite for functional complexity. We use diffusible iodine contrast-enhanced microCT (diceCT) imaging to digitally dissect the gross muscle anatomy of the pelvis and hindlimbs for 30 species of frogs representing five primary locomotor modes, including the first known detailed dissection for some of the world's smallest frogs, forming the largest digital comparative analysis of musculoskeletal structure in any vertebrate clade to date. By linking musculoskeletal dissections and phylogenetic comparative methods, we then quantify and compare relationships between anatomy and function across over 160 million years of anuran evolution.
View Article and Find Full Text PDFChem Sci
July 2024
Department of Chemical Engineering, Purdue University West Lafayette USA
Deductive solution strategies are required in prediction scenarios that are under determined, when contradictory information is available, or more generally wherever one-to-many non-functional mappings occur. In contrast, most contemporary machine learning (ML) in the chemical sciences is inductive learning from example, with a fixed set of features. Chemical workflows are replete with situations requiring deduction, including many aspects of lab automation and spectral interpretation.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!