Inception score: Difference between revisions
added categories |
|||
Line 2: | Line 2: | ||
== Definition == |
== Definition == |
||
Let there be two spaces, the space of images <math>\Omega_X</math> and the space of labels <math>\Omega_Y</math>. The space of labels is finite. |
Let there be two spaces, the space of images <math>\Omega_X</math> and the space of labels <math>\Omega_Y</math>. The space of labels is finite. |
||
Let <math>p_{gen}</math> be a probability distribution over <math>\Omega_X</math> that we wish to judge. |
Let <math>p_{gen}</math> be a probability distribution over <math>\Omega_X</math> that we wish to judge. |
||
Let a discriminator be a function of type <math display="block">p_{dis}:\Omega_X \to M(\Omega_Y)</math>where <math>M(\Omega_Y)</math> is the set of all probability distributions on <math>\Omega_Y</math>. For any image <math>x</math>, and any label <math>y</math>, let <math>p_{dis}(y|x)</math> be the probability that image <math>x</math> has label <math>y</math>, according to the discriminator. It is usually implemented as an Inception-v3 network trained on ImageNet. |
Let a discriminator be a function of type <math display="block">p_{dis}:\Omega_X \to M(\Omega_Y)</math>where <math>M(\Omega_Y)</math> is the set of all probability distributions on <math>\Omega_Y</math>. For any image <math>x</math>, and any label <math>y</math>, let <math>p_{dis}(y|x)</math> be the probability that image <math>x</math> has label <math>y</math>, according to the discriminator. It is usually implemented as an Inception-v3 network trained on ImageNet. |
||
The '''Inception Score''' of <math>p_{gen}</math> relative to <math>p_{dis}</math> is<math display="block">IS(p_{gen}, p_{dis}) := \exp\left( \mathbb E_{x\sim p_{gen}}\left[ |
The '''Inception Score''' of <math>p_{gen}</math> relative to <math>p_{dis}</math> is<math display="block">IS(p_{gen}, p_{dis}) := \exp\left( \mathbb E_{x\sim p_{gen}}\left[ |
||
Line 15: | Line 15: | ||
H_y[\mathbb E_{x\sim p_{gen}}[p_{dis}(y | x)]] |
H_y[\mathbb E_{x\sim p_{gen}}[p_{dis}(y | x)]] |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
'''INPUT''' generator <math>g</math>. |
'''INPUT''' generator <math>g</math>. |
||
Line 31: | Line 29: | ||
Sample more images <math>x_i</math> from generator, and for each, compute <math>D_{KL} \left(p_{dis}(\cdot | x_i) \| \hat p\right)</math>. |
Sample more images <math>x_i</math> from generator, and for each, compute <math>D_{KL} \left(p_{dis}(\cdot | x_i) \| \hat p\right)</math>. |
||
Average the results, and take its exponential. |
Average the results, and take its exponential. |
||
Return the result. |
Return the result.}} |
||
=== Interpretation === |
=== Interpretation === |
||
A higher inception score is interpreted as "better", as it means that <math>p_{gen}</math> is a "sharp and distinct" collection of pictures. |
A higher inception score is interpreted as "better", as it means that <math>p_{gen}</math> is a "sharp and distinct" collection of pictures. |
||
<math>\ln IS(p_{gen}, p_{dis}) \in [0, \ln N]</math>, where <math>N</math> is the total number of possible labels. |
<math>\ln IS(p_{gen}, p_{dis}) \in [0, \ln N]</math>, where <math>N</math> is the total number of possible labels. |
||
<math>\ln IS(p_{gen}, p_{dis}) = 0</math> iff for almost all <math>x\sim p_{gen}</math><math display="block">p_{dis}(\cdot | x) = \int p_{dis}(\cdot | x) p_{gen}(x)dx</math>That means <math>p_{gen}</math> is completely "indistinct". That is, for any image <math>x</math> sampled from <math>p_{gen}</math>, discriminator returns exactly the same label predictions <math>p_{dis}(\cdot | x)</math>. |
<math>\ln IS(p_{gen}, p_{dis}) = 0</math> iff for almost all <math>x\sim p_{gen}</math><math display="block">p_{dis}(\cdot | x) = \int p_{dis}(\cdot | x) p_{gen}(x)dx</math>That means <math>p_{gen}</math> is completely "indistinct". That is, for any image <math>x</math> sampled from <math>p_{gen}</math>, discriminator returns exactly the same label predictions <math>p_{dis}(\cdot | x)</math>. |
||
The highest inception score <math>N</math> is achieved if and only if the two conditions are both true: |
The highest inception score <math>N</math> is achieved if and only if the two conditions are both true: |
||
* For almost all <math>x\sim p_{gen}</math>, the distribution <math>p_{dis}(y|x)</math> is concentrated on one label. That is, <math>H_y[p_{dis}(y|x)] = 0</math>. That is, every image sampled from <math>p_{gen}</math> is exactly classified by the discriminator. |
* For almost all <math>x\sim p_{gen}</math>, the distribution <math>p_{dis}(y|x)</math> is concentrated on one label. That is, <math>H_y[p_{dis}(y|x)] = 0</math>. That is, every image sampled from <math>p_{gen}</math> is exactly classified by the discriminator. |
||
* For every label <math>y</math>, the proportion of generated images labelled as <math>y</math> is exactly <math>\mathbb E_{x\sim p_{gen}}[p_{dis}(y | x)] = \frac 1 N</math>. That is, the generated images are equally distributed over all labels. |
* For every label <math>y</math>, the proportion of generated images labelled as <math>y</math> is exactly <math>\mathbb E_{x\sim p_{gen}}[p_{dis}(y | x)] = \frac 1 N</math>. That is, the generated images are equally distributed over all labels. |
||
Line 56: | Line 52: | ||
== References == |
== References == |
||
<references /> |
<references /> |
||
[[Category:Machine learning]] |
[[Category:Machine learning]] |
||
[[Category:Computer graphics]] |
[[Category:Computer graphics]] |
Revision as of 02:25, 16 July 2022
The Inception Score (IS) is an algorithm used to assess the quality of images created by a generative model, like a generative adversarial network (GAN).[1] It has been somewhat superseded by the Fréchet inception distance.[2]
Definition
Let there be two spaces, the space of images and the space of labels . The space of labels is finite.
Let be a probability distribution over that we wish to judge.
Let a discriminator be a function of type where is the set of all probability distributions on . For any image , and any label , let be the probability that image has label , according to the discriminator. It is usually implemented as an Inception-v3 network trained on ImageNet.
The Inception Score of relative to isEquivalent rewrites includeTo show that this is nonnegative, use Jensen's inequality.
Pseudocode:
INPUT discriminator .
INPUT generator .
Sample images from generator.
Compute , the probability distribution over labels conditional on image .
Sum up the results to obtain , an empirical estimate of .
Sample more images from generator, and for each, compute .
Average the results, and take its exponential.
Return the result.
Interpretation
A higher inception score is interpreted as "better", as it means that is a "sharp and distinct" collection of pictures.
, where is the total number of possible labels.
iff for almost all That means is completely "indistinct". That is, for any image sampled from , discriminator returns exactly the same label predictions .
The highest inception score is achieved if and only if the two conditions are both true:
- For almost all , the distribution is concentrated on one label. That is, . That is, every image sampled from is exactly classified by the discriminator.
- For every label , the proportion of generated images labelled as is exactly . That is, the generated images are equally distributed over all labels.
See also
References
- ^ Salimans, Tim; Goodfellow, Ian; Zaremba, Wojciech; Cheung, Vicki; Radford, Alec; Chen, Xi; Chen, Xi (2016). "Improved Techniques for Training GANs". Advances in Neural Information Processing Systems. 29. Curran Associates, Inc.
- ^ Borji, Ali (2022). "Pros and cons of GAN evaluation measures: New developments". Computer Vision and Image Understanding. 215: 103329. doi:10.1016/j.cviu.2021.103329.