A Very Short Introduction to Diffusion Models

Kailash Ahirwar
4 min readSep 26, 2023

--

Artificial Intelligence is constantly evolving to solve hard and complex problems. Image generation is one such complex problem for AI models. GANs, VAEs, and Flow Models have been good but struggled to generate high-resolution images with high fidelity. On the other hand, Diffusion odels are very good at producing high-resolution images of diverse quality with great accuracy. Currently, they are at the forefront of the generative AI (GenAI) revolution that we see everywhere. Models like GLIDE, DALL.E-3 by OpenAI, Imagen by Google, and Stable Diffusion are some trendy diffusion models. Let’s have a look at Diffusion Models.

What are Diffusion Models?

Diffusion models are a class of generative AI models that generate high-resolution images of varying quality. They work by gradually adding Gaussian noise to the original data in the forward diffusion process and then learning to remove the noise in the reverse diffusion process. They are latent variable models referring to a hidden continuous feature space, look similar to VAEs(Variational Autoencoders), and are loosely based on non-equilibrium thermodynamics.

Diffusion Models: Credits: https://theaisummer.com/diffusion-models/

Problems with the existing models

Existing deep learning models like GANs and VAEs are good at generating images but they struggle with some problems. GANs still struggle with training instability and generating diverse image problems, due to their adversarial training nature. The surrogate loss in VAEs creates issues.

Let’s understand Diffusion Models in detail

A denoising diffusion modeling is a two step process:

  1. Forward diffusion process — The forward diffusion process is the Markov chain of diffusion steps in which we slowly and randomly add noise to the original data.
  2. Reverse diffusion process — The reverse diffusion process tries to reverse the diffusion process to generate original data from the noise.

Forward diffusion process

In the forward diffusion process, we slowly and gradually add Gaussian noise to the input image x₀ through a series of T steps. We start with sampling a data point x₀ from the real data distribution q(x) like (x₀ ~ q(x)) and then adding some Gaussian noise with variance βₜ to xₜ₋₁, producing a new latent variable xₜ​ with distribution q(xₜ​​∣xₜ₋₁​)

Forward diffusion process. Credits: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
Forward diffusion process. Credits: https://theaisummer.com/diffusion-models/

Here, q(xₜ​​∣xₜ₋₁​) is defined by the mean μ as

and ∑ as ∑ₜ​=βₜI. FYI, I is the identity matrix and Σ will always be a diagonal matrix of variances. As T approaches ∞, x_{T} becomes isotropic Gaussian distribution.

The reparameterization trick

Applying q(xₜ | xₜ₋₁) and calculating xₜ for an arbitrary time step can get very costly for a large number of steps. The reparameterization trick solves this problem and allows us to sample xₜ at any arbitrary time step from the following distribution:

After parameterization trick

You can learn more about the reparameterization trick here.

Reverse diffusion process

It is the process of training a neural network to recover the original data by reversing the noising process applied in the forward pass. Estimating q(xₜ₋₁|xₜ) is difficult as it can require the whole dataset. That’s why a parameterized model p_θ(Neural Network) can be used to learn the parameters. For small enough βₜ,​ it will be a Gaussian and can be obtained by just parameterizing the mean and variance.

Reverse diffusion process. Credits: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
Reverse diffusion process. Credits: https://theaisummer.com/diffusion-models/

We train the network to predict the mean and variance for each time step. Here μ_θ(xₜ,t) is the mean, and ∑_θ(xₜ,t) is the covariance matrix.

Top Diffusion Models

  1. Diffusion probabilistic models (DPM; Sohl-Dickstein et al., 2015)
  2. Denoising diffusion probabilistic models (DDPM; Ho et al. 2020)
  3. Cascading Diffusion Models (CDM; https://cascaded-diffusion.github.io)
  4. Latent Diffusion Models (LDM; https://arxiv.org/abs/2112.10752)

This was a very short introduction to Diffusion Models. I wanted to give a basic understanding of Diffusion Models and how they work. If you want to understand Diffusion Models in detail and learn the mathematics behind them, read the following articles:

  1. What are Diffusion Models? — https://lilianweng.github.io/posts/2021-07-11-diffusion-models
  2. How diffusion models work: the math from scratch — https://theaisummer.com/diffusion-models
  3. Introduction to Diffusion Models for Machine Learning — https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction
  4. Diffusion Models Made Easy — https://towardsdatascience.com/diffusion-models-made-easy-8414298ce4da
  5. Diffusion Models: A Comprehensive Survey of Methods and Applications — https://arxiv.org/pdf/2209.00796.pdf

--

--

Kailash Ahirwar

Artificial Intelligence Research | Author - Generative Adversarial Networks Projects | Co-founder - Raven Protocol | Mate Labs