Inception score

The Inception Score (IS) is an algorithm used to assess the quality of images created by a generative model, like a generative adversarial network (GAN).^[1] It has been somewhat superceded by the Fréchet inception distance.

Definition

Let there be two spaces, the space of images $\Omega _{X}$ and the space of labels $\Omega _{Y}$ . The space of labels is finite.

Let $p_{gen}$ be a probability distribution over $\Omega _{X}$ that we wish to judge.

Let a discriminator be a function of type $p_{dis}:\Omega _{X}\to M(\Omega _{Y})$ where $M(\Omega _{Y})$ is the set of all probability distributions on $\Omega _{Y}$ . For any image $x$ , and any label $y$ , let $p_{dis}(y|x)$ be the probability that image $x$ has label $y$ , according to the discriminator. It is usually implemented as an Inception-v3 network trained on ImageNet.

The Inception Score of $p_{gen}$ relative to $p_{dis}$ is $\ln IS(p_{gen},p_{dis}):=\mathbb {E} _{x\sim p_{gen}}\left[D_{KL}\left(p_{dis}(\cdot |x)\|\int p_{dis}(\cdot |x)p_{gen}(x)dx\right)\right]$ Equivalent rewrites include $\ln IS(p_{gen},p_{inc}):=\mathbb {E} _{x\sim p_{gen}}\left[D_{KL}\left(p_{dis}(\cdot |x)\|\mathbb {E} _{x\sim p_{gen}}[p_{dis}(\cdot |x)]\right)\right]$ $\ln IS(p_{gen},p_{dis}):=H_{y}[\mathbb {E} _{x\sim p_{gen}}[p_{dis}(y|x)]]\mathbb {E} _{x\sim p_{gen}}[H_{y}[p_{dis}(y|x)]]$ To show that this is nonnegative, use Jensen's inequality.

Pseudocode:

INPUT discriminator $p_{dis}$ .
INPUT generator $g$ .
Sample images $x_{i}$ from generator.
Compute $p_{dis}(\cdot |x_{i})$ , the probability distribution over labels conditional on image $x_{i}$ .
Sum up the results to obtain ${\hat {p}}$ , an empirical estimate of $\int p_{dis}(\cdot |x)p_{gen}(x)dx$ .
Sample more images $x_{i}$ from generator, and for each, compute $D_{KL}\left(p_{dis}(\cdot |x_{i})\|{\hat {p}}\right)$ .
Average the results, and take its exponential.
Return the result.

Interpretation

$\ln IS(p_{gen},p_{dis})\in [0,\ln N]$ , where $N$ is the total number of possible labels.

High is good. It means that $p_{gen}$ is a "sharp and distinct" collection of pictures.

$\ln IS(p_{gen},p_{dis})=0$ iff for almost all $x\sim p_{gen}$

$p_{dis}(\cdot |x)=\int p_{dis}(\cdot |x)p_{gen}(x)dx$

That means $p_{gen}$ is completely "indistinct". That is, for any image $x$ sampled from $p_{gen}$ , discriminator returns exactly the same label predictions $p_{dis}(\cdot |x)$ .

The optimal log-inception score $\ln N$ is achieved iff the two conditions are true:

For almost all $x\sim p_{gen}$ , the distribution $p_{dis}(y|x)$ is concentrated on one label. That is, $H_{y}[p_{dis}(y|x)]=0$ . That is, every image sampled from $p_{gen}$ is exactly classified by the discriminator.

For every label $y$ , the proportion of generated images labelled as $y$ is exactly $\mathbb {E} _{x\sim p_{gen}}[p_{dis}(y|x)]={\frac {1}{N}}$ . That is, the generated images are equally distributed over all labels.

^ Salimans, Tim; Goodfellow, Ian; Zaremba, Wojciech; Cheung, Vicki; Radford, Alec; Chen, Xi; Chen, Xi (2016). "Improved Techniques for Training GANs". Advances in Neural Information Processing Systems. 29. Curran Associates, Inc.

[1] Salimans, Tim; Goodfellow, Ian; Zaremba, Wojciech; Cheung, Vicki; Radford, Alec; Chen, Xi; Chen, Xi (2016). "Improved Techniques for Training GANs". Advances in Neural Information Processing Systems. 29. Curran Associates, Inc.

[1]