Synthesizing realistic fine-grained images from text descriptions is a significant computer vision task. Although many GANs-based methods have been proposed to solve this task, generating high-quality images consistent with text information remains a difficult problem. These existing GANs-based methods ignore important words due to the use of fixed initial word features in generator, and neglect to learn semantic consistency between images and texts for discriminators. In this article, we propose a novel attentional generation and contrastive adversarial framework for fine-grained text-to-image synthesis, termed as Word Self-Update Contrastive Adversarial Networks (WSC-GAN). Specifically, we introduce a dual attention module for modeling color details and semantic information. With a new designed word self-update module, the generator can leverage visually important words to compute attention maps in the feature synthesis module. Furthermore, we contrive multi-branch contrastive discriminators to maintain better consistency between the generated image and text description. Two novel contrastive losses are proposed for our discriminators to impose image-sentence and image-word consistency constraints. Extensive experiments on CUB and MS-COCO datasets demonstrate that our method achieves better performance compared with state-of-the-art methods.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2023.08.038 | DOI Listing |
Neural Netw
October 2023
School of Information Engineering, Minzu University of China, 100081, Beijing, China; Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing, 100081, China. Electronic address:
Synthesizing realistic fine-grained images from text descriptions is a significant computer vision task. Although many GANs-based methods have been proposed to solve this task, generating high-quality images consistent with text information remains a difficult problem. These existing GANs-based methods ignore important words due to the use of fixed initial word features in generator, and neglect to learn semantic consistency between images and texts for discriminators.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!