SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis.

Neural Netw

Shanghai Key Lab of Modern Optical System, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, 20093, Shanghai, China. Electronic address:

Published: June 2021

Synthesizing photo-realistic images based on text descriptions is a challenging task in the field of computer vision. Although generative adversarial networks have made significant breakthroughs in this task, they still face huge challenges in generating high-quality visually realistic images consistent with the semantics of text. Generally, existing text-to-image methods accomplish this task with two steps, that is, first generating an initial image with a rough outline and color, and then gradually yielding the image within high-resolution from the initial image. However, one drawback of these methods is that, if the quality of the initial image generation is not high, it is hard to generate a satisfactory high-resolution image. In this paper, we propose SAM-GAN, Self-Attention supporting Multi-stage Generative Adversarial Networks, for text-to-image synthesis. With the self-attention mechanism, the model can establish the multi-level dependence of the image and fuse the sentence- and word-level visual-semantic vectors, to improve the quality of the generated image. Furthermore, a multi-stage perceptual loss is introduced to enhance the semantic similarity between the synthesized image and the real image, thus enhancing the visual-semantic consistency between text and images. For the diversity of the generated images, a mode seeking regularization term is integrated into the model. The results of extensive experiments and ablation studies, which were conducted in the Caltech-UCSD Birds and Microsoft Common Objects in Context datasets, show that our model is superior to competitive models in text-to-image synthesis.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2021.01.023DOI Listing

Publication Analysis

Top Keywords

generative adversarial
12
adversarial networks
12
text-to-image synthesis
12
initial image
12
image
9
sam-gan self-attention
8
self-attention supporting
8
supporting multi-stage
8
multi-stage generative
8
networks text-to-image
8

Similar Publications

Retinal OCT image classification based on MGR-GAN.

Med Biol Eng Comput

January 2025

School of Automation and Information Engineering, Sichuan University of Science & Engineering, Key Laboratory of Artificial Intelligence, Yibin, 644000, Sichuan, China.

Accurately classifying optical coherence tomography (OCT) images is essential for diagnosing and treating ophthalmic diseases. This paper introduces a novel generative adversarial network framework called MGR-GAN. The masked image modeling (MIM) method is integrated into the GAN model's generator, enhancing its ability to synthesize more realistic images by reconstructing them based on unmasked patches.

View Article and Find Full Text PDF

A generative adversarial network (GAN) makes it possible to map a data sample from one domain to another one. It has extensively been employed in image-to-image and text-to image translation. We propose an EEG-to-EEG translation model to map the scalp-mounted EEG (scEEG) sensor signals to intracranial EEG (iEEG) sensor signals recorded by foramen ovale sensors inserted into the brain.

View Article and Find Full Text PDF

Accurately identifying and discriminating between different brain states is a major emphasis of functional brain imaging research. Various machine learning techniques play an important role in this regard. However, when working with a small number of study participants, the lack of sufficient data and achieving meaningful classification results remain a challenge.

View Article and Find Full Text PDF

Improved Grain Boundary Reconstruction Method Based on Channel Attention Mechanism.

Materials (Basel)

January 2025

Hubei Key Laboratory of Plasma Chemistry and Advanced Materials, School of Materials Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China.

The grain size of metal materials has a significant impact on their macroscopic properties. However, original metallographic images often suffer from issues such as substantial noise, missing grain boundaries, low contrast, and blurred edges. These challenges hinder the accurate extraction of complete grain boundaries, limiting the precision of grain size measurement and material performance prediction.

View Article and Find Full Text PDF

Anomaly detection is crucial in areas such as financial fraud identification, cybersecurity defense, and health monitoring, as it directly affects the accuracy and security of decision-making. Existing generative adversarial nets (GANs)-based anomaly detection methods overlook the importance of local density, limiting their effectiveness in detecting anomaly objects in complex data distributions. To address this challenge, we introduce a generative adversarial local density-based anomaly detection (GALD) method, which combines the data distribution modeling capabilities of GANs with local synthetic density analysis.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!