DDPM: Denoising Diffusion Probabilistic Models

๐Ÿ“… ์ตœ์ดˆ ์ž‘์„ฑ: 2025๋…„ 7์›” 13์ผ

๐Ÿ”„ ์ตœ์ข… ์—…๋ฐ์ดํŠธ: 2025๋…„ 7์›” 13์ผ 18:42 (KST)

โœจ ์ตœ๊ทผ ๋ณ€๊ฒฝ์‚ฌํ•ญ: ์ž„์˜ ์‹œ์  ์ƒ˜ํ”Œ๋ง ๋ชฉํ‘œ ์ถ”๊ฐ€, ํ•ต์‹ฌ ๊ฐœ๋… ์š”์•ฝ ์„ค๋ช… ๋ณด๊ฐ•, ์ˆ˜์‹ ๋ Œ๋”๋ง ๋ฌธ์ œ ํ•ด๊ฒฐ

Jonathan Ho ยท Ajay Jain ยท Pieter Abbeel

Denoising Diffusion Probabilistic Models, NeurIPS 2020

arXiv 2006.11239 โ€ข GitHub Repository

๐Ÿ“ Abstract

DDPM(Denoising Diffusion Probabilistic Models)์€ ๋น„ํ‰ํ˜• ์—ด์—ญํ•™์˜ ์›๋ฆฌ์—์„œ ์˜๊ฐ์„ ๋ฐ›์•„ ์„ค๊ณ„๋œ ์ƒˆ๋กœ์šด ์ƒ์„ฑ ๋ชจ๋ธ๋กœ, ๋ฐ์ดํ„ฐ์— ์ ์ง„์ ์œผ๋กœ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” forward process์™€ ์ด๋ฅผ ์—ญ์œผ๋กœ ์ œ๊ฑฐํ•˜๋Š” reverse process๋ฅผ ํ†ตํ•ด ๊ณ ํ’ˆ์งˆ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ํ˜์‹ ์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•˜์˜€๋‹ค.

๋ณธ ์—ฐ๊ตฌ์˜ ํ•ต์‹ฌ ๊ธฐ์—ฌ๋Š” ํ™•์‚ฐ ํ™•๋ฅ  ๋ชจ๋ธ(Diffusion Probabilistic Models)๊ณผ ๋ž‘์ฃผ๋ฑ… ๋‹ค์ด๋‚˜๋ฏน์Šค(Langevin dynamics) ๊ธฐ๋ฐ˜์˜ denoising score matching ๊ฐ„์˜ ์ƒˆ๋กœ์šด ์ด๋ก ์  ์—ฐ๊ฒฐ๊ณ ๋ฆฌ๋ฅผ ๋ฐœ๊ฒฌํ•œ ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ฐ€์ค‘ ๋ณ€๋ถ„ ๊ฒฝ๊ณ„(weighted variational bound)๋ฅผ ํ•™์Šต ๋ชฉํ‘œ๋กœ ์„ค์ •ํ•˜์—ฌ, ๋ณต์žกํ•œ ์ƒ์„ฑ ๋ฌธ์ œ๋ฅผ ์ˆ˜๋งŽ์€ ๋‹จ์ˆœํ•œ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฌธ์ œ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ๊ตฌ์ถ•ํ•˜์˜€๋‹ค.

Langevin Dynamics: ๋ฌผ ๋ถ„์ž๋“ค์ด ๊ฝƒ๊ฐ€๋ฃจ๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์ถฉ๋Œ์‹œํ‚ด (๋…ธ์ด์ฆˆ) โ†’ ์ค‘๋ ฅ์ด๋‚˜ ์ ์„ฑ์ด ํŠน์ • ๋ฐฉํ–ฅ์œผ๋กœ ๋Œ์–ด๋‹น๊น€ (๋ณต์›๋ ฅ) ์™€ ๊ฐ™์ด ๋…ธ์ด์ฆˆ ์ดํ›„์— ๋ณต์› ํ•˜๋Š” ๋™์—ญํ•™. DDPM์—์„œ๋Š” ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  ํ•™์Šต๋œ ๋ฐฉํ–ฅ์„ฑ์„ ๋”ฐ๋ผ ์ ์ง„์ ์œผ๋กœ ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ๋ณต์›ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ํ™œ์šฉ

์‹คํ—˜ ๊ฒฐ๊ณผ, CIFAR-10 ๋ฐ์ดํ„ฐ์…‹์—์„œ Inception Score 9.46๊ณผ FID 3.17์ด๋ผ๋Š” ๋‹น์‹œ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์˜€์œผ๋ฉฐ, 256ร—256 LSUN ๋ฐ์ดํ„ฐ์…‹์—์„œ๋„ ProgressiveGAN๊ณผ ๊ฒฌ์ค„ ๋งŒํ•œ ํ’ˆ์งˆ์˜ ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•˜์˜€๋‹ค. ํŠนํžˆ ๊ธฐ์กด GAN ๋ชจ๋ธ๋“ค๊ณผ ๋‹ฌ๋ฆฌ ์ ๋Œ€์  ํ•™์Šต ์—†์ด๋„ ์•ˆ์ •์ ์ธ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, ๋ชจ๋“œ ๋ถ•๊ดด(mode collapse) ๋ฌธ์ œ์—์„œ ์ž์œ ๋กญ๋‹ค๋Š” ์žฅ์ ์„ ๋ณด์˜€๋‹ค.

Background

DDPM์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋จผ์ € ํ™•์‚ฐ ํ™•๋ฅ  ๋ชจ๋ธ(Diffusion Probabilistic Models)์˜ ๊ธฐ๋ณธ ๊ฐœ๋…์„ ํŒŒ์•…ํ•ด์•ผ ํ•œ๋‹ค. ํ™•์‚ฐ ๋ชจ๋ธ์€ foward process, reverse process ๋‘ ๊ฐ€์ง€ ๊ณผ์ •์œผ๋กœ ๊ตฌ์„ฑ๋˜๋Š”๋ฐ, ๋จผ์ € ์ˆœ๋ฐฉํ–ฅ ๊ณผ์ •๊ณผ ์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฐœ๋…์ธ markov chain ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž.

๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ(Markov Chain)

$$p(x_t | x_{t-1}, x_{t-2}, \ldots, x_0) = p(x_t | x_{t-1})$$

๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ(Markov Chain)์€ DDPM์˜ ๊ธฐ๋ณธ ๊ตฌ์กฐ๋ฅผ ์ด๋ฃจ๋Š” ํ•ต์‹ฌ ๊ฐœ๋…์ด๋‹ค. ํ˜„์žฌ ์ƒํƒœ $x_t$๊ฐ€ ์˜ค์ง ๋ฐ”๋กœ ์ด์ „ ์ƒํƒœ $x_{t-1}$์—๋งŒ ์˜์กดํ•˜๋ฉฐ, ๊ทธ ์ด์ „์˜ ๋ชจ๋“  ๊ณผ๊ฑฐ ํžˆ์Šคํ† ๋ฆฌ๋Š” ๋ฌด์‹œํ•˜๋Š” ๊ธฐ์–ต ์ƒ์‹คํŠน์„ฑ์„ ๊ฐ–๋Š”๋‹ค.

๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ์˜ ์ฐจ์ˆ˜(Order):

DDPM์—์„œ์˜ ๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ:

์ˆœ๋ฐฉํ–ฅ ๊ณผ์ •(Forward Process) - ํ™•์‚ฐ ๊ณผ์ •

$$q(x_{1:T}|x_0) := \prod_{t=1}^T q(x_t|x_{t-1}), \quad q(x_t|x_{t-1}) := \mathcal{N}(x_t; \sqrt{1-\beta_t}x_{t-1}, \beta_t I) \tag{1}$$

์ˆœ๋ฐฉํ–ฅ ๊ณผ์ •(Forward Process)์€ ํ™•์‚ฐ ๋ชจ๋ธ์˜ ์ถœ๋ฐœ์ ์ด๋‹ค. ์ด๋Š” ํ™•์‚ฐ ๊ณผ์ •(Diffusion Process)์ด๋ผ๊ณ ๋„ ๋ถˆ๋ฆฌ๋ฉฐ, ์›๋ณธ ๋ฐ์ดํ„ฐ $x_0$์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ์ ์ง„์ ์œผ๋กœ ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•ด ์™„์ „ํ•œ ๋…ธ์ด์ฆˆ $x_T$๊นŒ์ง€ ๋„๋‹ฌํ•˜๋Š” ๊ณ ์ •๋œ ๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ์ด๋‹ค.

๐Ÿ“‹ ๋ณ€์ˆ˜ ์„ค๋ช…:

๋ณ€์ˆ˜ ์˜๋ฏธ ์„ค๋ช…
$q$ ์ˆœ๋ฐฉํ–ฅ ๋ถ„ํฌ ๊ณ ์ •๋œ, ํ•™์Šต๋˜์ง€ ์•Š๋Š” ๋ถ„ํฌ
$p$ ์—ญ๋ฐฉํ–ฅ ๋ถ„ํฌ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋ถ„ํฌ
$x_0$ ์›๋ณธ ๋ฐ์ดํ„ฐ ์™„์ „ํ•œ ์ด๋ฏธ์ง€
$x_t$ ์‹œ์  t ๋ฐ์ดํ„ฐ ๋…ธ์ด์ฆˆ๊ฐ€ ์ถ”๊ฐ€๋œ ์ด๋ฏธ์ง€
$T$ ํ™•์‚ฐ ๋‹จ๊ณ„ ์ˆ˜ ์ผ๋ฐ˜์ ์œผ๋กœ 1000
$\beta_t$ ๋ถ„์‚ฐ ์Šค์ผ€์ค„ ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€ ์ •๋„ ์ œ์–ด
$\mathcal{N}(x; \mu, \sigma^2)$ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ ํ‰๊ท  $\mu$, ๋ถ„์‚ฐ $\sigma^2$

๐Ÿ—๏ธ ์—ญ๋ฐฉํ–ฅ ๊ณผ์ •(Reverse Process) - ์ƒ์„ฑ ๊ณผ์ •

$$p_\theta(x_{0:T}) := p(x_T) \prod_{t=1}^T p_\theta(x_{t-1}|x_t), \quad p_\theta(x_{t-1}|x_t) := \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \Sigma_\theta(x_t, t)) \tag{2}$$

์—ญ๋ฐฉํ–ฅ ๊ณผ์ •(Reverse Process)์€ ํ™•์‚ฐ ๋ชจ๋ธ์˜ ํ•ต์‹ฌ์ด์ž ์ƒ์„ฑ ๊ณผ์ •์ด๋‹ค. ์ด๋Š” ๋””๋…ธ์ด์ง• ๊ณผ์ •(Denoising Process)์ด๋ผ๊ณ ๋„ ๋ถˆ๋ฆฌ๋ฉฐ, ์™„์ „ํ•œ ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ์ ์ง„์ ์œผ๋กœ ๋…ธ์ด์ฆˆ๋ฅผ ์ œ๊ฑฐํ•ด ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์›ํ•˜๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ์ด๋‹ค.

Diffusion model์€ $p_\theta(x_0) := \int p_\theta(x_{0:T}) dx_{1:T}$ ํ˜•ํƒœ์˜ latent variable model์ด๋‹ค.

๐ŸŽฏ ๋ณ€๋ถ„ ๊ฒฝ๊ณ„(Variational Bound)๋ฅผ ํ™œ์šฉํ•œ Loss

$$\mathbb{E}[-\log p_\theta(x_0)] \leq \mathbb{E}_q\left[-\log \frac{p_\theta(x_{0:T})}{q(x_{1:T}|x_0)}\right] = \mathbb{E}_q\left[-\log p(x_T) - \sum_{t \geq 1} \log \frac{p_\theta(x_{t-1}|x_t)}{q(x_t|x_{t-1})}\right] =: L \tag{3}$$

๋ณ€๋ถ„ ๊ฒฝ๊ณ„(Variational Bound)๋Š” DDPM ํ•™์Šต์˜ ํ•ต์‹ฌ ๋ชฉํ‘œ ํ•จ์ˆ˜์ด๋‹ค. ์ง์ ‘ ๊ณ„์‚ฐ์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ์Œ์˜ ๋กœ๊ทธ ์šฐ๋„๋ฅผ ๋ณ€๋ถ„ ์ถ”๋ก ์„ ํ†ตํ•ด ์ƒํ•œ(upper bound)์œผ๋กœ ๊ทผ์‚ฌํ•˜์—ฌ ์ตœ์ ํ™”ํ•œ๋‹ค.

๋ฒ ์ด์ฆˆ ์ •๋ฆฌ ์ ์šฉ
$$= \mathbb{E}_{x_T \sim q(x_T|x_0)}\left[-\log \frac{p_\theta(x_{0:T})}{p_\theta(x_{1:T}|x_0)}\right] \quad \because p(a) = \frac{p(a,b)}{p(b|a)}$$
๋ณด์กฐ ๋ถ„ํฌ๋ฅผ ๋ถ„๋ชจ ๋ถ„์ž์— ๊ณฑํ•ด์คŒ
$$= \mathbb{E}_{x_T \sim q(x_T|x_0)}\left[-\log \frac{p_\theta(x_{0:T})}{q(x_{1:T}|x_0)} \ast \frac{q(x_{1:T}|x_0)}{p_\theta(x_{1:T}|x_0)}\right]$$
Jensen's Inequality ์ ์šฉ
$$\leq \mathbb{E}_{x_T \sim q(x_T|x_0)}\left[-\log \frac{p_\theta(x_{0:T})}{q(x_{1:T}|x_0)}\right] \quad \because D_{KL} \geq 0$$

๐Ÿ”ฌ ์ˆœ๋ฐฉํ–ฅ ๊ณผ์ •์˜ ์ž„์˜ ์‹œ์  ์ƒ˜ํ”Œ๋ง

$$q(x_t|x_0) = \mathcal{N}(x_t; \sqrt{\bar{\alpha}_t}x_0, (1-\bar{\alpha}_t)I) \tag{4}$$

์ˆ˜์‹ (4)๋Š” DDPM์˜ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ํ˜์‹ ์ด๋‹ค. ์ด ๊ณต์‹์„ ํ†ตํ•ด ์›๋ณธ ์ด๋ฏธ์ง€ $x_0$์—์„œ ์ค‘๊ฐ„ ๋‹จ๊ณ„ ์—†์ด ๋ฐ”๋กœ ์ž„์˜์˜ ์‹œ์  $t$๋กœ ์ ํ”„ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ธฐ์กด ํ™•์‚ฐ ๋ชจ๋ธ๋“ค์ด $x_0 โ†’ x_1 โ†’ ... โ†’ x_t$ ์ˆœ์ฐจ์ ์œผ๋กœ ๊ฑฐ์ณ์•ผ ํ–ˆ๋˜ ๊ฒƒ๊ณผ ๋‹ฌ๋ฆฌ, ๋‹จ ํ•œ ๋ฒˆ์˜ ๊ณ„์‚ฐ์œผ๋กœ ์›ํ•˜๋Š” ๋…ธ์ด์ฆˆ ๋ ˆ๋ฒจ์— ๋„๋‹ฌ ๊ฐ€๋Šฅํ•˜๋‹ค.

์ •๋ฆฌ: ๋งค๊ฐœ๋ณ€์ˆ˜ ์ •์˜ ๋ฐ ์„ฑ๋ฆฝ ์กฐ๊ฑด

๋งค๊ฐœ๋ณ€์ˆ˜ $\alpha_t := 1 - \beta_t$, $\bar{\alpha}_t := \prod_{s=1}^t \alpha_s$์— ๋Œ€ํ•˜์—ฌ, ๋‹ค์Œ์ด ์„ฑ๋ฆฝํ•œ๋‹ค:

$$q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t}x_{t-1}, \beta_t I) = \mathcal{N}(x_t; \sqrt{\alpha_t}x_{t-1}, (1-\alpha_t)I)$$

์žฌ๋งค๊ฐœ๋ณ€์ˆ˜ํ™”(Reparameterization):

$$x_t = \sqrt{\alpha_t}x_{t-1} + \sqrt{1-\alpha_t}\epsilon_{t-1}, \quad \epsilon_{t-1} \sim \mathcal{N}(0, I)$$

๐Ÿ”„ ๋ณ€๋ถ„ ๊ฒฝ๊ณ„ L์˜ ๋ณ€ํ˜• ๊ณผ์ •

$$L = \underbrace{D_{KL}(q(x_T|x_0) \parallel p(x_T))}_{L_T} + \sum_{t>1} \underbrace{D_{KL}(q(x_{t-1}|x_t, x_0) \parallel p_\theta(x_{t-1}|x_t))}_{L_{t-1}} - \underbrace{\log p_\theta(x_0|x_1)}_{L_0} \tag{5}$$

์‹ (5)๋Š” ๋ณ€๋ถ„ ๊ฒฝ๊ณ„ $L$์„ KL ๋ฐœ์‚ฐ(Kullback-Leibler divergence)๋“ค์˜ ํ•ฉ์œผ๋กœ ๋ถ„ํ•ดํ•œ ํ•ต์‹ฌ์ ์ธ ํ˜•ํƒœ์ด๋‹ค. ์ด ๋ถ„ํ•ด๋ฅผ ํ†ตํ•ด ๋ณต์žกํ•œ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ํ•ด์„ ๊ฐ€๋Šฅํ•œ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ž‘์€ ๋ฌธ์ œ๋“ค๋กœ ๋‚˜๋ˆ„์–ด ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค.

๐Ÿ“‹ ๊ฐ ํ•ญ์˜ ์˜๋ฏธ:

ํ•ญ ์ˆ˜์‹ ์˜๋ฏธ ํŠน์„ฑ
$L_T$ $D_{KL}(q(x_T|x_0) \parallel p(x_T))$ ์ตœ์ข… ๋…ธ์ด์ฆˆ ๋งค์นญ ํ•™์Šต ๋ถˆ๊ฐ€๋Šฅ (๊ณ ์ •)
$L_{t-1}$ $D_{KL}(q(x_{t-1}|x_t, x_0) \parallel p_\theta(x_{t-1}|x_t))$ ์ค‘๊ฐ„ ๋‹จ๊ณ„ ๋””๋…ธ์ด์ง• ํ•™์Šต ๊ฐ€๋Šฅ
$L_0$ $-\log p_\theta(x_0|x_1)$ ์ตœ์ข… ๋ณต์› ์šฐ๋„ ํ•™์Šต ๊ฐ€๋Šฅ

๐Ÿ”„ VAE vs DDPM: ๋ณ€๋ถ„ ๊ฒฝ๊ณ„ ๋น„๊ต

VAE์˜ ELBO (Evidence Lower BOund):

$$L_{VAE} = \underbrace{D_{KL}(q_{\phi}(z|x) \parallel p(z))}_{\text{Regularization}} + \underbrace{-\mathbb{E}_{q_{\phi}(z|x)}[\log p_{\theta}(x|z)]}_{\text{Reconstruction}}$$

DDPM์˜ ๋ณ€๋ถ„ ๊ฒฝ๊ณ„:

$$L_{DDPM} = \underbrace{D_{KL}(q(x_T|x_0) \parallel p(x_T))}_{L_T} + \sum_{t>1} \underbrace{D_{KL}(q(x_{t-1}|x_t, x_0) \parallel p_\theta(x_{t-1}|x_t))}_{L_{t-1}} - \underbrace{\log p_\theta(x_0|x_1)}_{L_0}$$

๐Ÿ” VAE์™€์˜ ๊ตฌ์กฐ์  ๋Œ€์‘

VAE ํ•ญ DDPM ํ•ญ ์—ญํ•  ์„ค๋ช…
Regularization L_T (Regularization) ์ž ์žฌ ๊ณต๊ฐ„ ์ •๊ทœํ™” VAE: $q_{\phi}(z|x) \parallel p(z)$
DDPM: $q(x_T|x_0) \parallel p(x_T)$
Reconstruction L_0 (Reconstruction) ๋ฐ์ดํ„ฐ ๋ณต์› VAE: $-\mathbb{E}[\log p_{\theta}(x|z)]$
DDPM: $-\log p_\theta(x_0|x_1)$
N/A Denoising ํ•ญ ๋‹ค๋‹จ๊ณ„ ๋””๋…ธ์ด์ง• $\sum_{t>1} L_{t-1}$ (VAE์—๋Š” ์—†๋Š” DDPM๋งŒ์˜ ๊ณ ์œ ํ•œ ํ•ญ)

๐Ÿ”‘ ํ•ต์‹ฌ ํ†ต์ฐฐ:

DDPM์€ VAE์˜ ๋‹จ์ผ ์ž ์žฌ๋ณ€์ˆ˜ $z$๋ฅผ ๋‹ค๋‹จ๊ณ„ ์ž ์žฌ๋ณ€์ˆ˜ ์‹œํ€€์Šค $x_1, x_2, ..., x_T$๋กœ ํ™•์žฅํ•œ ๋ชจ๋ธ์ด๋‹ค.

๐ŸŽฏ ๋ณ€๋ถ„ ๊ฒฝ๊ณ„ ํ•ญ๋“ค์˜ ์‹ค์ œ ๊ตฌํ˜„

๐Ÿ“Š Regularization ($L_T$): ์‚ฌ์‹ค์ƒ ๋ถˆํ•„์š”ํ•œ ํ•ญ

DDPM์—์„œ๋Š” ์ด Regularization Error๊ฐ€ ์‚ฌ์‹ค์ƒ ํ•„์š” ์—†์Šต๋‹ˆ๋‹ค. $T$๊ฐ€ ์ถฉ๋ถ„ํžˆ ํด ๋•Œ $\bar{\alpha}_T \to 0$์ด๋ฏ€๋กœ $q(x_T|x_0) \approx \mathcal{N}(0, I) = p(x_T)$๊ฐ€ ๋˜์–ด $L_T \approx 0$์ด ๋ฉ๋‹ˆ๋‹ค.

๐ŸŽฏ Denoising ($L_{t-1}$): ํ•ต์‹ฌ ํ•™์Šต ๋ชฉํ‘œ

๋‘ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ ๊ฐ„์˜ KL-Divergence๋Š” ๋ถ„์‚ฐ ๋ถ€๋ถ„์„ ์‚ฌ์ „ ์ •์˜๋œ ๊ฐ’์œผ๋กœ ๊ณ ์ •ํ•˜๋ฉด, ๋‘ ํ‰๊ท ์— ๋Œ€ํ•œ MSE ๋ฌธ์ œ๋กœ ๋‹จ์ˆœํ™”๋ฉ๋‹ˆ๋‹ค:

$$L_{t-1} = \mathbb{E}_{x_0, \epsilon_t}\left[\frac{1}{2}\left(\mu_\theta(x_t, t) - \tilde{\mu}_t(x_t, x_0)\right)^2\right]$$

์—ฌ๊ธฐ์„œ ์‚ฌํ›„ ๋ถ„ํฌ์˜ ํ‰๊ท ์€:

$$\tilde{\mu}_t(x_t, x_0) = \frac{\sqrt{\bar{\alpha}_{t-1}}\beta_t}{1-\bar{\alpha}_t}x_0 + \frac{\sqrt{\alpha_t}(1-\bar{\alpha}_{t-1})}{1-\bar{\alpha}_t}x_t$$

๐Ÿ” ์ˆœ๋ฐฉํ–ฅ ๊ณผ์ •์˜ ์‚ฌํ›„ ๋ถ„ํฌ(Posterior)

KL ๋ฐœ์‚ฐ์„ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ˆœ๋ฐฉํ–ฅ ๊ณผ์ •์˜ ์‚ฌํ›„ ๋ถ„ํฌ(Posterior) $q(x_{t-1}|x_t, x_0)$๋ฅผ ๊ตฌํ•ด์•ผ ํ•œ๋‹ค. ์ด๋Š” ๋†€๋ž๊ฒŒ๋„ ํ•ด์„์ ์œผ๋กœ ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋กœ ํ‘œํ˜„๋œ๋‹ค:

$$q(x_{t-1}|x_t, x_0) = \mathcal{N}(x_{t-1}; \tilde{\mu}_t(x_t, x_0), \tilde{\beta}_t I) \tag{6}$$

์—ฌ๊ธฐ์„œ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์€:

$$\tilde{\mu}_t(x_t, x_0) := \frac{\sqrt{\bar{\alpha}_{t-1}}\beta_t}{1-\bar{\alpha}_t}x_0 + \frac{\sqrt{\alpha_t}(1-\bar{\alpha}_{t-1})}{1-\bar{\alpha}_t}x_t$$
$$\tilde{\beta}_t := \frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_t}\beta_t$$

๐ŸŽฒ Rao-Blackwell ์ •๋ฆฌ์˜ ํ™œ์šฉ

๋ชจ๋“  KL ๋ฐœ์‚ฐ์ด ๊ฐ€์šฐ์‹œ๊ฐ„๋“ค ๊ฐ„์˜ ๋น„๊ต์ด๋ฏ€๋กœ, ๋ชฌํ…Œ์นด๋ฅผ๋กœ ์ถ”์ •(Monte Carlo estimation) ๋Œ€์‹  ๋‹ซํžŒ ํ˜•ํƒœ์˜ ํ•ด์„์‹(closed form expressions)์œผ๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค:

$$D_{KL}(\mathcal{N}(\mu_1, \Sigma_1) \parallel \mathcal{N}(\mu_2, \Sigma_2)) = \frac{1}{2}\left[\log\frac{|\Sigma_2|}{|\Sigma_1|} - d + \text{tr}(\Sigma_2^{-1}\Sigma_1) + (\mu_2-\mu_1)^T\Sigma_2^{-1}(\mu_2-\mu_1)\right]$$

๐Ÿ”‘ ํ•ต์‹ฌ ์žฅ์ :

  1. ๋ถ„์‚ฐ ๊ฐ์†Œ: ๊ณ ๋ถ„์‚ฐ ๋ชฌํ…Œ์นด๋ฅผ๋กœ ์ถ”์ • โ†’ ์ €๋ถ„์‚ฐ ํ•ด์„์  ๊ณ„์‚ฐ
  2. ๊ณ„์‚ฐ ํšจ์œจ์„ฑ: ๋ณต์žกํ•œ ์ ๋ถ„ โ†’ ๋‹จ์ˆœํ•œ ํ–‰๋ ฌ ์—ฐ์‚ฐ
  3. ์•ˆ์ •์  ํ•™์Šต: ์ผ๊ด€๋œ ๊ธฐ์šธ๊ธฐ โ†’ ์•ˆ์ •์ ์ธ ์ˆ˜๋ ด

๊ฒฐ๋ก 

DDPM์€ ํ™•์‚ฐ ๊ณผ์ •์˜ ํ˜์‹ ์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ํ†ตํ•ด ์ƒ์„ฑ ๋ชจ๋ธ๋ง ๋ถ„์•ผ์— ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์‹œํ–ˆ๋‹ค. ๋ณ€๋ถ„ ๊ฒฝ๊ณ„๋ฅผ ํ†ตํ•œ ์•ˆ์ •์ ์ธ ํ•™์Šต, ์ž„์˜ ์‹œ์  ์ƒ˜ํ”Œ๋ง์˜ ํšจ์œจ์„ฑ, ๊ทธ๋ฆฌ๊ณ  VAE์™€์˜ ๊ตฌ์กฐ์  ์—ฐ๊ด€์„ฑ์„ ํ†ตํ•ด ์ด๋ก ์  ๊ธฐ๋ฐ˜์„ ํ™•๊ณ ํžˆ ํ–ˆ๋‹ค. ์ด๋Š” ํ˜„์žฌ ์ตœ์‹  AI ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋„๊ตฌ๋“ค์˜ ํ•ต์‹ฌ ๊ธฐ์ˆ ๋กœ ์ž๋ฆฌ์žก๊ฒŒ ๋œ ํ† ๋Œ€๊ฐ€ ๋˜์—ˆ๋‹ค.

์ด ๋ฌธ์„œ๋Š” DDPM ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๊ฐœ๋…๋“ค์„ ์ฒด๊ณ„์ ์œผ๋กœ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.