Stable Diffusion is a deep learning, text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.
Stable Diffusion is a latent diffusion model, a variety of deep generative neural network developed by the CompVis group at LMU Munich.The model has been released by a collaboration of Stability AI, CompVis LMU, and Runway with support from EleutherAI and LAION.
Stable Diffusion is an open-source machine learning model that can generate images from text, modify images based on text, or fill in details on low-resolution or low-detail images.
The dynamic team of Robin Rombach (Stability AI) and Patrick Esser (Runway ML) from the CompVis Group at LMU Munich headed by Prof. Dr. Björn Ommer, led the original Stable Diffusion V1 release. They built on their prior work of the lab with Latent Diffusion Models and got critical support from LAION and Eleuther AI. You can read more about the original Stable Diffusion V1 release in our earlier blog post. Robin is now leading the effort with Katherine Crowson at Stability AI to create the next generation of media models with our broader team.
Stable Diffusion 2.0 delivers a number of big improvements and features versus the original V1 release, so let’s dive in and take a look at them.
The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512×512 pixels and 768×768 pixels.
These models are trained on an aesthetic subset of the LAION-5B dataset created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using LAION’s NSFW filter
Stable Diffusion 2.0 also includes an Upscaler Diffusion model that enhances the resolution of images by a factor of 4. Below is an example of our model upscaling a low-resolution generated image (128×128) into a higher resolution image (512×512). Combined with our text-to-image models, Stable Diffusion 2.0 can now generate images with resolutions of 2048×2048–or even higher.
Stability AI is building open AI tools to provide the foundation to awaken humanity’s potential.
Our values are lived by every team member and shown by everyone who excels at Stability AI. They are how we measure ourselves and our work.
Our vibrant communities consist of experts, leaders and partners across the globe. They are developing cutting-edge open AI models for Image, Language, Audio, Video, 3D, and Biology. AI by the people, for the people.
The underlying dataset for Stable Diffusion was the 2b English language label subset of LAION 5b https://laion.ai/blog/laion-5b/, a general crawl of the internet created by the German charity LAION. The CompVis team at the University of Heidelberg trained the model in compliance with German law. The underlying dataset was not filtered to exclude or include any specific group.
Most NVidia GPUs with 6GB or more, at 512 x 512 AMD — most cards that support ROCm 5.0, i.e. their 6xxxx line can run the model.
Stability AI is building open AI tools that will let us reach our potential.
If you find innovation and accessibility interesting then join our community