DALL-E is an AI (Artificial Intelligence) system that has been designed and trained to generate new images. The technology can generate an image from a text prompt, like “A bowl of soup that is a portal to another dimension”.
Like GPT-3, DALL·E is a transformer language model. It receives both the text and the image as a single stream of data containing up to 1280 tokens, and is trained using maximum likelihood to generate all of the tokens, one after another.
A token is any symbol from a discrete vocabulary; for humans, each English letter is a token from a 26-letter alphabet. DALL·E’s vocabulary has tokens for both text and image concepts. Specifically, each image caption is represented using a maximum of 256 BPE-encoded tokens with a vocabulary size of 16384, and the image is represented using 1024 tokens with a vocabulary size of 8192.
The images are preprocessed to 256×256 resolution during training. Similar to VQVAE,1415 each image is compressed to a 32×32 grid of discrete latent codes using a discrete VAE1011 that we pretrained using a continuous relaxation.1213 We found that training using the relaxation obviates the need for an explicit codebook, EMA loss, or tricks like dead code revival, and can scale up to large vocabulary sizes.
This training procedure allows DALL·E to not only generate an image from scratch, but also to regenerate any rectangular region of an existing image that extends to the bottom-right corner, in a way that is consistent with the text prompt.
NightCafe – DALL-E
NightCafe’s AI Art Generator to create an amazing artwork, then buy a print to hang in your home or office – the ultimate conversation piece. Create Your Own. Use the NightCafe Creator to generate, share and print your own AI art. Use your own input images, choose a style, and be amazed with the result!
Use the NightCafe Creator to generate, share and print your own AI art. Use your own input images, choose a style, and be amazed with the result!