Adversarial text-to-image synthesis: A review.

Neural Netw

Technische Universität Kaiserslautern, Germany; Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany.

Published: December 2021

With the advent of generative adversarial networks, synthesizing images from text descriptions has recently become an active research area. It is a flexible and intuitive way for conditional image generation with significant progress in the last years regarding visual realism, diversity, and semantic alignment. However, the field still faces several challenges that require further research efforts such as enabling the generation of high-resolution images with multiple objects, and developing suitable and reliable evaluation metrics that correlate with human judgement. In this review, we contextualize the state of the art of adversarial text-to-image synthesis models, their development since their inception five years ago, and propose a taxonomy based on the level of supervision. We critically examine current strategies to evaluate text-to-image synthesis models, highlight shortcomings, and identify new areas of research, ranging from the development of better datasets and evaluation metrics to possible improvements in architectural design and model training. This review complements previous surveys on generative adversarial networks with a focus on text-to-image synthesis which we believe will help researchers to further advance the field.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2021.07.019DOI Listing

Publication Analysis

Top Keywords

text-to-image synthesis
16
adversarial text-to-image
8
generative adversarial
8
adversarial networks
8
evaluation metrics
8
synthesis models
8
adversarial
4
synthesis
4
synthesis review
4
review advent
4

Similar Publications

Text-driven 3D scene generation is widely applicable to video gaming, film industry, and metaverse applications that have a large demand for 3D scenes. However, existing text-to-3D generation methods are limited to producing 3D objects with simple geometries and dreamlike styles that lack realism. In this work, we present Text2NeRF, which is able to generate a wide range of 3D scenes with complicated geometric structures and high-fidelity textures purely from a text prompt.

View Article and Find Full Text PDF

Human faces contain rich semantic information that could hardly be described without a large vocabulary and complex sentence patterns. However, most existing text-to-image synthesis methods could only generate meaningful results based on limited sentence templates with words contained in the training set, which heavily impairs the generalization ability of these models. In this paper, we define a novel 'free-style' text-to-face generation and manipulation problem, and propose an effective solution, named AnyFace++, which is applicable to a much wider range of open-world scenarios.

View Article and Find Full Text PDF

Finetuning of GLIDE stable diffusion model for AI-based text-conditional image synthesis of dermoscopic images.

Front Med (Lausanne)

October 2023

Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States.

Background: The development of artificial intelligence (AI)-based algorithms and advances in medical domains rely on large datasets. A recent advancement in text-to-image generative AI is GLIDE (Guided Language to Image Diffusion for Generation and Editing). There are a number of representations available in the GLIDE model, but it has not been refined for medical applications.

View Article and Find Full Text PDF

The synthesis of high-resolution remote sensing images based on text descriptions has great potential in many practical application scenarios. Although deep neural networks have achieved great success in many important remote sensing tasks, generating realistic remote sensing images from text descriptions is still very difficult. To address this challenge, we propose a novel text-to-image modern Hopfield network (Txt2Img-MHN).

View Article and Find Full Text PDF

Word self-update contrastive adversarial networks for text-to-image synthesis.

Neural Netw

October 2023

School of Information Engineering, Minzu University of China, 100081, Beijing, China; Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing, 100081, China. Electronic address:

Synthesizing realistic fine-grained images from text descriptions is a significant computer vision task. Although many GANs-based methods have been proposed to solve this task, generating high-quality images consistent with text information remains a difficult problem. These existing GANs-based methods ignore important words due to the use of fixed initial word features in generator, and neglect to learn semantic consistency between images and texts for discriminators.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!