Many of us have probably been mind blowned by the recent progress in text-to-image generation tasks demonstrated by models such as GLIDE, DALLE-2 or Imagen. A lot of these incredible results can be attributed to the meteoric rise of diffusion models, a type of generative model that has been gaining significant popularity recently and has shown to outperform GANs on image synthesis. This presentation aims at giving a small (and humble) introduction to these models and to share my understanding on how we went from the original denoising diffusion probabilistic model to DALLE-2.

External References