Inception score: Difference between revisions
Adding short description: "Image algorithm" |
|||
(15 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Image algorithm}} |
|||
⚫ | |||
The '''Inception Score (IS)''' is an algorithm used to assess the quality of images created by a [[generative model|generative]] image model such as a [[generative adversarial network]] (GAN).<ref name="Salimans"/> The score is calculated based on the output of a separate, pretrained [[Inception (deep learning architecture)#Inception v3|Inception v3]] image classification model applied to a sample of (typically around 30,000) images generated by the generative model. The Inception Score is maximized when the following conditions are true: |
|||
# The [[Entropy (information theory)|entropy]] of the distribution of labels predicted by the Inceptionv3 model for the generated images is minimized. In other words, the classification model confidently predicts a single label for each image. Intuitively, this corresponds to the desideratum of generated images being "sharp" or "distinct". |
|||
# The predictions of the classification model are evenly distributed across all possible labels. This corresponds to the desideratum that the output of the generative model is "diverse".<ref name="Frolov"/> |
|||
It has been somewhat superseded by the related [[Fréchet inception distance]].<ref name="Borji"/> While the Inception Score only evaluates the distribution of generated images, the FID compares the distribution of generated images with the distribution of a set of real images ("ground truth"). |
|||
== Definition == |
== Definition == |
||
Line 10: | Line 16: | ||
The '''Inception Score''' of <math>p_{gen}</math> relative to <math>p_{dis}</math> is<math display="block">IS(p_{gen}, p_{dis}) := \exp\left( \mathbb E_{x\sim p_{gen}}\left[ |
The '''Inception Score''' of <math>p_{gen}</math> relative to <math>p_{dis}</math> is<math display="block">IS(p_{gen}, p_{dis}) := \exp\left( \mathbb E_{x\sim p_{gen}}\left[ |
||
D_{KL} \left(p_{dis}(\cdot | x) \| \int p_{dis}(\cdot | x) p_{gen}(x)dx \right) |
D_{KL} \left(p_{dis}(\cdot | x) \| \int p_{dis}(\cdot | x) p_{gen}(x)dx \right) |
||
\right]\right)</math>Equivalent rewrites include<math display="block">\ln IS(p_{gen}, p_{ |
\right]\right)</math>Equivalent rewrites include<math display="block">\ln IS(p_{gen}, p_{dis}) := \mathbb E_{x\sim p_{gen}}\left[ |
||
D_{KL} \left(p_{dis}(\cdot | x) \| \mathbb E_{x\sim p_{gen}}[p_{dis}(\cdot | x)]\right) |
D_{KL} \left(p_{dis}(\cdot | x) \| \mathbb E_{x\sim p_{gen}}[p_{dis}(\cdot | x)]\right) |
||
\right]</math><math display="block">\ln IS(p_{gen}, p_{dis}) := |
\right]</math><math display="block">\ln IS(p_{gen}, p_{dis}) := |
||
H[\mathbb E_{x\sim p_{gen}}[p_{dis}(\cdot | x)]] |
|||
⚫ | |||
⚫ | |||
'''Pseudocode:'''{{blockquote|'''INPUT''' discriminator <math>p_{dis}</math>. |
'''Pseudocode:'''{{blockquote|'''INPUT''' discriminator <math>p_{dis}</math>. |
||
Line 31: | Line 36: | ||
Average the results, and take its exponential. |
Average the results, and take its exponential. |
||
'''RETURN''' the result.}} |
|||
=== Interpretation === |
=== Interpretation === |
||
Line 44: | Line 49: | ||
* For every label <math>y</math>, the proportion of generated images labelled as <math>y</math> is exactly <math>\mathbb E_{x\sim p_{gen}}[p_{dis}(y | x)] = \frac 1 N</math>. That is, the generated images are equally distributed over all labels. |
* For every label <math>y</math>, the proportion of generated images labelled as <math>y</math> is exactly <math>\mathbb E_{x\sim p_{gen}}[p_{dis}(y | x)] = \frac 1 N</math>. That is, the generated images are equally distributed over all labels. |
||
== |
== References == |
||
{{reflist|refs= |
|||
⚫ | <ref name="Salimans">{{Cite journal |last1=Salimans |first1=Tim |last2=Goodfellow |first2=Ian |last3=Zaremba |first3=Wojciech |last4=Cheung |first4=Vicki |last5=Radford |first5=Alec |last6=Chen |first6=Xi |last7=Chen |first7=Xi |date=2016 |title=Improved Techniques for Training GANs |url=https://proceedings.neurips.cc/paper/2016/hash/8a3363abe792db2d8761d6403605aeb7-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=29|arxiv=1606.03498 }}</ref> |
||
<ref name="Frolov">{{cite journal |
|||
|title=Adversarial text-to-image synthesis: A review |
|||
|date=December 2021 |
|||
|journal=Neural Networks|volume=144|pages=187–209 |
|||
|last1=Frolov|first1=Stanislav|last2=Hinz|first2=Tobias|last3=Raue|first3=Federico|last4=Hees|first4=Jörn|last5=Dengel|first5=Andreas |
|||
|doi=10.1016/j.neunet.2021.07.019 |
|||
|pmid=34500257 |
|||
|s2cid=231698782 |
|||
|doi-access=free|arxiv=2101.09983}}</ref> |
|||
<ref name="Borji">{{Cite journal |last=Borji |first=Ali |date=2022 |title=Pros and cons of GAN evaluation measures: New developments |url=https://linkinghub.elsevier.com/retrieve/pii/S1077314221001685 |journal=Computer Vision and Image Understanding |language=en |volume=215 |pages=103329 |doi=10.1016/j.cviu.2021.103329|arxiv=2103.09396 |s2cid=232257836 }}</ref> |
|||
}} |
|||
{{Machine learning evaluation metrics}} |
|||
* [[Fréchet distance]] |
|||
* [[Fréchet inception distance]] |
|||
* [[Generative adversarial network]] |
|||
== References == |
|||
<references /> |
|||
[[Category:Machine learning]] |
[[Category:Machine learning]] |
Latest revision as of 03:31, 27 December 2024
The Inception Score (IS) is an algorithm used to assess the quality of images created by a generative image model such as a generative adversarial network (GAN).[1] The score is calculated based on the output of a separate, pretrained Inception v3 image classification model applied to a sample of (typically around 30,000) images generated by the generative model. The Inception Score is maximized when the following conditions are true:
- The entropy of the distribution of labels predicted by the Inceptionv3 model for the generated images is minimized. In other words, the classification model confidently predicts a single label for each image. Intuitively, this corresponds to the desideratum of generated images being "sharp" or "distinct".
- The predictions of the classification model are evenly distributed across all possible labels. This corresponds to the desideratum that the output of the generative model is "diverse".[2]
It has been somewhat superseded by the related Fréchet inception distance.[3] While the Inception Score only evaluates the distribution of generated images, the FID compares the distribution of generated images with the distribution of a set of real images ("ground truth").
Definition
[edit]Let there be two spaces, the space of images and the space of labels . The space of labels is finite.
Let be a probability distribution over that we wish to judge.
Let a discriminator be a function of type where is the set of all probability distributions on . For any image , and any label , let be the probability that image has label , according to the discriminator. It is usually implemented as an Inception-v3 network trained on ImageNet.
The Inception Score of relative to isEquivalent rewrites include is nonnegative by Jensen's inequality.
Pseudocode:
INPUT discriminator .
INPUT generator .
Sample images from generator.
Compute , the probability distribution over labels conditional on image .
Sum up the results to obtain , an empirical estimate of .
Sample more images from generator, and for each, compute .
Average the results, and take its exponential.
RETURN the result.
Interpretation
[edit]A higher inception score is interpreted as "better", as it means that is a "sharp and distinct" collection of pictures.
, where is the total number of possible labels.
iff for almost all That means is completely "indistinct". That is, for any image sampled from , discriminator returns exactly the same label predictions .
The highest inception score is achieved if and only if the two conditions are both true:
- For almost all , the distribution is concentrated on one label. That is, . That is, every image sampled from is exactly classified by the discriminator.
- For every label , the proportion of generated images labelled as is exactly . That is, the generated images are equally distributed over all labels.
References
[edit]- ^ Salimans, Tim; Goodfellow, Ian; Zaremba, Wojciech; Cheung, Vicki; Radford, Alec; Chen, Xi; Chen, Xi (2016). "Improved Techniques for Training GANs". Advances in Neural Information Processing Systems. 29. Curran Associates, Inc. arXiv:1606.03498.
- ^ Frolov, Stanislav; Hinz, Tobias; Raue, Federico; Hees, Jörn; Dengel, Andreas (December 2021). "Adversarial text-to-image synthesis: A review". Neural Networks. 144: 187–209. arXiv:2101.09983. doi:10.1016/j.neunet.2021.07.019. PMID 34500257. S2CID 231698782.
- ^ Borji, Ali (2022). "Pros and cons of GAN evaluation measures: New developments". Computer Vision and Image Understanding. 215: 103329. arXiv:2103.09396. doi:10.1016/j.cviu.2021.103329. S2CID 232257836.