Inception score: Difference between revisions

Content deleted Content added

Inline

Revision as of 02:25, 16 July 2022

The Inception Score (IS) is an algorithm used to assess the quality of images created by a generative model, like a generative adversarial network (GAN).^[1] It has been somewhat superseded by the Fréchet inception distance.^[2]

Definition

Let there be two spaces, the space of images $\Omega _{X}$ and the space of labels $\Omega _{Y}$ . The space of labels is finite.

Let $p_{gen}$ be a probability distribution over $\Omega _{X}$ that we wish to judge.

Let a discriminator be a function of type $p_{dis}:\Omega _{X}\to M(\Omega _{Y})$ where $M(\Omega _{Y})$ is the set of all probability distributions on $\Omega _{Y}$ . For any image $x$ , and any label $y$ , let $p_{dis}(y|x)$ be the probability that image $x$ has label $y$ , according to the discriminator. It is usually implemented as an Inception-v3 network trained on ImageNet.

The Inception Score of $p_{gen}$ relative to $p_{dis}$ is $IS(p_{gen},p_{dis}):=\exp \left(\mathbb {E} _{x\sim p_{gen}}\left[D_{KL}\left(p_{dis}(\cdot |x)\|\int p_{dis}(\cdot |x)p_{gen}(x)dx\right)\right]\right)$ Equivalent rewrites include $\ln IS(p_{gen},p_{inc}):=\mathbb {E} _{x\sim p_{gen}}\left[D_{KL}\left(p_{dis}(\cdot |x)\|\mathbb {E} _{x\sim p_{gen}}[p_{dis}(\cdot |x)]\right)\right]$ $\ln IS(p_{gen},p_{dis}):=H_{y}[\mathbb {E} _{x\sim p_{gen}}[p_{dis}(y|x)]]\mathbb {E} _{x\sim p_{gen}}[H_{y}[p_{dis}(y|x)]]$ To show that this is nonnegative, use Jensen's inequality.

Pseudocode:

INPUT discriminator $p_{dis}$ .
INPUT generator $g$ .
Sample images $x_{i}$ from generator.
Compute $p_{dis}(\cdot |x_{i})$ , the probability distribution over labels conditional on image $x_{i}$ .
Sum up the results to obtain ${\hat {p}}$ , an empirical estimate of $\int p_{dis}(\cdot |x)p_{gen}(x)dx$ .
Sample more images $x_{i}$ from generator, and for each, compute $D_{KL}\left(p_{dis}(\cdot |x_{i})\|{\hat {p}}\right)$ .
Average the results, and take its exponential.
Return the result.

Interpretation

A higher inception score is interpreted as "better", as it means that $p_{gen}$ is a "sharp and distinct" collection of pictures.

$\ln IS(p_{gen},p_{dis})\in [0,\ln N]$ , where $N$ is the total number of possible labels.

$\ln IS(p_{gen},p_{dis})=0$ iff for almost all $x\sim p_{gen}$ $p_{dis}(\cdot |x)=\int p_{dis}(\cdot |x)p_{gen}(x)dx$ That means $p_{gen}$ is completely "indistinct". That is, for any image $x$ sampled from $p_{gen}$ , discriminator returns exactly the same label predictions $p_{dis}(\cdot |x)$ .

The highest inception score $N$ is achieved if and only if the two conditions are both true:

For almost all $x\sim p_{gen}$ , the distribution $p_{dis}(y|x)$ is concentrated on one label. That is, $H_{y}[p_{dis}(y|x)]=0$ . That is, every image sampled from $p_{gen}$ is exactly classified by the discriminator.
For every label $y$ , the proportion of generated images labelled as $y$ is exactly $\mathbb {E} _{x\sim p_{gen}}[p_{dis}(y|x)]={\frac {1}{N}}$ . That is, the generated images are equally distributed over all labels.

References

^ Salimans, Tim; Goodfellow, Ian; Zaremba, Wojciech; Cheung, Vicki; Radford, Alec; Chen, Xi; Chen, Xi (2016). "Improved Techniques for Training GANs". Advances in Neural Information Processing Systems. 29. Curran Associates, Inc.
^ Borji, Ali (2022). "Pros and cons of GAN evaluation measures: New developments". Computer Vision and Image Understanding. 215: 103329. doi:10.1016/j.cviu.2021.103329.

[1] Salimans, Tim; Goodfellow, Ian; Zaremba, Wojciech; Cheung, Vicki; Radford, Alec; Chen, Xi; Chen, Xi (2016). "Improved Techniques for Training GANs". Advances in Neural Information Processing Systems. 29. Curran Associates, Inc.

[2] Borji, Ali (2022). "Pros and cons of GAN evaluation measures: New developments". Computer Vision and Image Understanding. 215: 103329. doi:10.1016/j.cviu.2021.103329.

[1]

[2]

@@ Line 2: / Line 2: @@
 == Definition ==
 Let there be two spaces, the space of images <math>\Omega_X</math> and the space of labels <math>\Omega_Y</math>. The space of labels is finite.
 Let <math>p_{gen}</math> be a probability distribution over <math>\Omega_X</math> that we wish to judge.
 Let a discriminator be a function of type <math display="block">p_{dis}:\Omega_X \to M(\Omega_Y)</math>where <math>M(\Omega_Y)</math> is the set of all probability distributions on <math>\Omega_Y</math>. For any image <math>x</math>, and any label <math>y</math>, let <math>p_{dis}(y|x)</math> be the probability that image <math>x</math> has label <math>y</math>, according to the discriminator. It is usually implemented as an Inception-v3 network trained on ImageNet.
 The '''Inception Score''' of <math>p_{gen}</math> relative to <math>p_{dis}</math> is<math display="block">IS(p_{gen}, p_{dis}) := \exp\left( \mathbb E_{x\sim p_{gen}}\left[
@@ Line 15: / Line 15: @@
 		  H_y[\mathbb E_{x\sim p_{gen}}[p_{dis}(y | x)]]
+		  \mathbb E_{x\sim p_{gen}}[ H_y[p_{dis}(y | x)]]</math>To show that this is nonnegative, use [[Jensen's inequality]].
+'''Pseudocode:'''{{blockquote|'''INPUT''' discriminator <math>p_{dis}</math>.
-		  \mathbb E_{x\sim p_{gen}}[ H_y[p_{dis}(y | x)]]</math>To show that this is nonnegative, use [[Jensen's inequality]].
-'''Pseudocode:'''<blockquote>'''INPUT''' discriminator <math>p_{dis}</math>.
 '''INPUT''' generator <math>g</math>.
@@ Line 31: / Line 29: @@
 Sample more images <math>x_i</math> from generator, and for each, compute <math>D_{KL} \left(p_{dis}(\cdot | x_i) \| \hat p\right)</math>.
 Average the results, and take its exponential.
-Return the result.</blockquote>
+Return the result.}}
 === Interpretation ===
 A higher inception score is interpreted as "better", as it means that <math>p_{gen}</math> is a "sharp and distinct" collection of pictures.
 <math>\ln IS(p_{gen}, p_{dis}) \in [0, \ln N]</math>, where <math>N</math> is the total number of possible labels.
 <math>\ln IS(p_{gen}, p_{dis}) = 0</math> iff for almost all <math>x\sim p_{gen}</math><math display="block">p_{dis}(\cdot | x) = \int p_{dis}(\cdot | x) p_{gen}(x)dx</math>That means <math>p_{gen}</math> is completely "indistinct". That is, for any image <math>x</math> sampled from <math>p_{gen}</math>, discriminator returns exactly the same label predictions <math>p_{dis}(\cdot | x)</math>.
 The highest inception score <math>N</math> is achieved if and only if the two conditions are both true:
 * For almost all <math>x\sim p_{gen}</math>, the distribution <math>p_{dis}(y|x)</math> is concentrated on one label. That is, <math>H_y[p_{dis}(y|x)] = 0</math>. That is, every image sampled from <math>p_{gen}</math> is exactly classified by the discriminator.
 * For every label <math>y</math>, the proportion of generated images labelled as <math>y</math> is exactly <math>\mathbb E_{x\sim p_{gen}}[p_{dis}(y | x)] = \frac 1 N</math>. That is, the generated images are equally distributed over all labels.
@@ Line 56: / Line 52: @@
 == References ==
 <references />
 [[Category:Machine learning]]
 [[Category:Computer graphics]]

Revision as of 02:25, 16 July 2022

Definition

Interpretation

See also

References