Microsoft’s Attentional Generative Adversarial Network (AttnGAN) can draw by just typing key words. (text-to-image generation)

AttnGan is an algorithm that visualizes text input. It was developed by Microsoft’s Deep Learning Technology Center, and while the project has a noble goal with its text-visualization algorithm, the results are often just plain bizarre.

AttnGAN TRANS architecture has three variants AttnGAN BERT, AttnGAN XL and AttnGAN GPT.

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks by Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong

https://github.com/taoxugit/AttnGAN

Pytorch implementation for reproducing AttnGAN results in the paper AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks by Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He. (This work was performed when Tao was an intern with Microsoft Research).

Dependencies

python 2.7

Pytorch

PyTorch implementation of a modified (style-based) AttnGAN [code] architecture that incorporates the strong latent space control provided by StyleGAN*. This architecture enables one to not only synthesize an image from an input text description, but also move that image in a desired disentangled dimension to alter its structure at different scales (from high-level coarse styles such as pose to fine-grained styles such as background lighting).

GIF OF LATENT SPACE INTERPOLATION EXAMPLE COMING SOON

“this is a black bird with gray and white wings and a bright yellow belly and chest.”	“tiny bird with long thighs, and a long pointed brown bill.”	“a small bird has a royal blue crown, black eyes, and a baby blue colored bill.”

This implementation also provides one with the option to utilize state-of-the-art transformer-based architectures from huggingface’s transformers library as the text encoder for Style-AttnGAN (currently only supports GPT-2). Among other things, utilization of these transformer-based encoders significantly improves image synthesis when the length of the input text sequence is large.

Original AttnGAN paper: AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks by Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He. (This work was performed when Tao was an intern with Microsoft Research). Thank you for your brilliant work.

AttnGAN architecture from arxiv.org/abs/1711.10485 StyleGAN generator architecture from arxiv.org/abs/1812.04948

Copied from LICENSE file (MIT License) for visibility:

Copyright for portions of project Style-AttnGAN are held by Tao Xu, 2018 as part of project AttnGAN. All other copyright for project Style-AttnGAN are held by Sidhartha Parhi, 2020. All non-data files that have not been modified by Sidhartha Parhi include the copyright notice “Copyright (c) 2018 Tao Xu” at the top of the file.

Instructions:

Dependencies

python 3.7+

PyTorch 1.0+

https://github.com/sidward14/Style-AttnGAN

https://arxiv.org/abs/1711.10485