Jump to content

Inception score: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
General + punct fixes
Adding short description: "Image algorithm"
 
(15 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{Short description|Image algorithm}}
The '''Inception Score (IS)''' is an algorithm used to assess the quality of images created by a generative model, like a [[generative adversarial network]] (GAN).<ref>{{Cite journal |last=Salimans |first=Tim |last2=Goodfellow |first2=Ian |last3=Zaremba |first3=Wojciech |last4=Cheung |first4=Vicki |last5=Radford |first5=Alec |last6=Chen |first6=Xi |last7=Chen |first7=Xi |date=2016 |title=Improved Techniques for Training GANs |url=https://proceedings.neurips.cc/paper/2016/hash/8a3363abe792db2d8761d6403605aeb7-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=29}}</ref> It has been somewhat superseded by the [[Fréchet inception distance]].<ref>{{Cite journal |last=Borji |first=Ali |date=2022 |title=Pros and cons of GAN evaluation measures: New developments |url=https://linkinghub.elsevier.com/retrieve/pii/S1077314221001685 |journal=Computer Vision and Image Understanding |language=en |volume=215 |pages=103329 |doi=10.1016/j.cviu.2021.103329}}</ref>
The '''Inception Score (IS)''' is an algorithm used to assess the quality of images created by a [[generative model|generative]] image model such as a [[generative adversarial network]] (GAN).<ref name="Salimans"/> The score is calculated based on the output of a separate, pretrained [[Inception (deep learning architecture)#Inception v3|Inception v3]] image classification model applied to a sample of (typically around 30,000) images generated by the generative model. The Inception Score is maximized when the following conditions are true:

# The [[Entropy (information theory)|entropy]] of the distribution of labels predicted by the Inceptionv3 model for the generated images is minimized. In other words, the classification model confidently predicts a single label for each image. Intuitively, this corresponds to the desideratum of generated images being "sharp" or "distinct".
# The predictions of the classification model are evenly distributed across all possible labels. This corresponds to the desideratum that the output of the generative model is "diverse".<ref name="Frolov"/>

It has been somewhat superseded by the related [[Fréchet inception distance]].<ref name="Borji"/> While the Inception Score only evaluates the distribution of generated images, the FID compares the distribution of generated images with the distribution of a set of real images ("ground truth").


== Definition ==
== Definition ==
Line 10: Line 16:
The '''Inception Score''' of <math>p_{gen}</math> relative to <math>p_{dis}</math> is<math display="block">IS(p_{gen}, p_{dis}) := \exp\left( \mathbb E_{x\sim p_{gen}}\left[
The '''Inception Score''' of <math>p_{gen}</math> relative to <math>p_{dis}</math> is<math display="block">IS(p_{gen}, p_{dis}) := \exp\left( \mathbb E_{x\sim p_{gen}}\left[
D_{KL} \left(p_{dis}(\cdot | x) \| \int p_{dis}(\cdot | x) p_{gen}(x)dx \right)
D_{KL} \left(p_{dis}(\cdot | x) \| \int p_{dis}(\cdot | x) p_{gen}(x)dx \right)
\right]\right)</math>Equivalent rewrites include<math display="block">\ln IS(p_{gen}, p_{inc}) := \mathbb E_{x\sim p_{gen}}\left[
\right]\right)</math>Equivalent rewrites include<math display="block">\ln IS(p_{gen}, p_{dis}) := \mathbb E_{x\sim p_{gen}}\left[
D_{KL} \left(p_{dis}(\cdot | x) \| \mathbb E_{x\sim p_{gen}}[p_{dis}(\cdot | x)]\right)
D_{KL} \left(p_{dis}(\cdot | x) \| \mathbb E_{x\sim p_{gen}}[p_{dis}(\cdot | x)]\right)
\right]</math><math display="block">\ln IS(p_{gen}, p_{dis}) :=
\right]</math><math display="block">\ln IS(p_{gen}, p_{dis}) :=
H_y[\mathbb E_{x\sim p_{gen}}[p_{dis}(y | x)]]
H[\mathbb E_{x\sim p_{gen}}[p_{dis}(\cdot | x)]]
-\mathbb E_{x\sim p_{gen}}[ H[p_{dis}(\cdot | x)]]</math><math>\ln IS</math> is nonnegative by [[Jensen's inequality]].

\mathbb E_{x\sim p_{gen}}[ H_y[p_{dis}(y | x)]]</math>To show that this is nonnegative, use [[Jensen's inequality]].


'''Pseudocode:'''{{blockquote|'''INPUT''' discriminator <math>p_{dis}</math>.
'''Pseudocode:'''{{blockquote|'''INPUT''' discriminator <math>p_{dis}</math>.
Line 31: Line 36:
Average the results, and take its exponential.
Average the results, and take its exponential.


Return the result.}}
'''RETURN''' the result.}}


=== Interpretation ===
=== Interpretation ===
Line 44: Line 49:
* For every label <math>y</math>, the proportion of generated images labelled as <math>y</math> is exactly <math>\mathbb E_{x\sim p_{gen}}[p_{dis}(y | x)] = \frac 1 N</math>. That is, the generated images are equally distributed over all labels.
* For every label <math>y</math>, the proportion of generated images labelled as <math>y</math> is exactly <math>\mathbb E_{x\sim p_{gen}}[p_{dis}(y | x)] = \frac 1 N</math>. That is, the generated images are equally distributed over all labels.


== See also ==
== References ==
{{reflist|refs=
<ref name="Salimans">{{Cite journal |last1=Salimans |first1=Tim |last2=Goodfellow |first2=Ian |last3=Zaremba |first3=Wojciech |last4=Cheung |first4=Vicki |last5=Radford |first5=Alec |last6=Chen |first6=Xi |last7=Chen |first7=Xi |date=2016 |title=Improved Techniques for Training GANs |url=https://proceedings.neurips.cc/paper/2016/hash/8a3363abe792db2d8761d6403605aeb7-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=29|arxiv=1606.03498 }}</ref>
<ref name="Frolov">{{cite journal
|title=Adversarial text-to-image synthesis: A review
|date=December 2021
|journal=Neural Networks|volume=144|pages=187–209
|last1=Frolov|first1=Stanislav|last2=Hinz|first2=Tobias|last3=Raue|first3=Federico|last4=Hees|first4=Jörn|last5=Dengel|first5=Andreas
|doi=10.1016/j.neunet.2021.07.019
|pmid=34500257
|s2cid=231698782
|doi-access=free|arxiv=2101.09983}}</ref>
<ref name="Borji">{{Cite journal |last=Borji |first=Ali |date=2022 |title=Pros and cons of GAN evaluation measures: New developments |url=https://linkinghub.elsevier.com/retrieve/pii/S1077314221001685 |journal=Computer Vision and Image Understanding |language=en |volume=215 |pages=103329 |doi=10.1016/j.cviu.2021.103329|arxiv=2103.09396 |s2cid=232257836 }}</ref>
}}


{{Machine learning evaluation metrics}}
* [[Fréchet distance]]
* [[Fréchet inception distance]]
* [[Generative adversarial network]]

== References ==
<references />


[[Category:Machine learning]]
[[Category:Machine learning]]

Latest revision as of 03:31, 27 December 2024

The Inception Score (IS) is an algorithm used to assess the quality of images created by a generative image model such as a generative adversarial network (GAN).[1] The score is calculated based on the output of a separate, pretrained Inception v3 image classification model applied to a sample of (typically around 30,000) images generated by the generative model. The Inception Score is maximized when the following conditions are true:

  1. The entropy of the distribution of labels predicted by the Inceptionv3 model for the generated images is minimized. In other words, the classification model confidently predicts a single label for each image. Intuitively, this corresponds to the desideratum of generated images being "sharp" or "distinct".
  2. The predictions of the classification model are evenly distributed across all possible labels. This corresponds to the desideratum that the output of the generative model is "diverse".[2]

It has been somewhat superseded by the related Fréchet inception distance.[3] While the Inception Score only evaluates the distribution of generated images, the FID compares the distribution of generated images with the distribution of a set of real images ("ground truth").

Definition

[edit]

Let there be two spaces, the space of images and the space of labels . The space of labels is finite.

Let be a probability distribution over that we wish to judge.

Let a discriminator be a function of type where is the set of all probability distributions on . For any image , and any label , let be the probability that image has label , according to the discriminator. It is usually implemented as an Inception-v3 network trained on ImageNet.

The Inception Score of relative to isEquivalent rewrites include is nonnegative by Jensen's inequality.

Pseudocode:

INPUT discriminator .

INPUT generator .

Sample images from generator.

Compute , the probability distribution over labels conditional on image .

Sum up the results to obtain , an empirical estimate of .

Sample more images from generator, and for each, compute .

Average the results, and take its exponential.

RETURN the result.

Interpretation

[edit]

A higher inception score is interpreted as "better", as it means that is a "sharp and distinct" collection of pictures.

, where is the total number of possible labels.

iff for almost all That means is completely "indistinct". That is, for any image sampled from , discriminator returns exactly the same label predictions .

The highest inception score is achieved if and only if the two conditions are both true:

  • For almost all , the distribution is concentrated on one label. That is, . That is, every image sampled from is exactly classified by the discriminator.
  • For every label , the proportion of generated images labelled as is exactly . That is, the generated images are equally distributed over all labels.

References

[edit]
  1. ^ Salimans, Tim; Goodfellow, Ian; Zaremba, Wojciech; Cheung, Vicki; Radford, Alec; Chen, Xi; Chen, Xi (2016). "Improved Techniques for Training GANs". Advances in Neural Information Processing Systems. 29. Curran Associates, Inc. arXiv:1606.03498.
  2. ^ Frolov, Stanislav; Hinz, Tobias; Raue, Federico; Hees, Jörn; Dengel, Andreas (December 2021). "Adversarial text-to-image synthesis: A review". Neural Networks. 144: 187–209. arXiv:2101.09983. doi:10.1016/j.neunet.2021.07.019. PMID 34500257. S2CID 231698782.
  3. ^ Borji, Ali (2022). "Pros and cons of GAN evaluation measures: New developments". Computer Vision and Image Understanding. 215: 103329. arXiv:2103.09396. doi:10.1016/j.cviu.2021.103329. S2CID 232257836.