Jump to content

Checking whether a coin is fair: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Monkbot (talk | contribs)
 
(41 intermediate revisions by 30 users not shown)
Line 1: Line 1:
{{More footnotes|date=January 2010}}
In [[statistics]], the question of '''checking whether a coin is fair''' is one whose importance lies, firstly, in providing a simple problem on which to illustrate basic ideas of [[statistical inference]] and, secondly, in providing a simple problem that can be used to compare various competing methods of statistical inference, including [[decision theory]]. The practical problem of checking whether a coin is fair might be considered as easily solved by performing a sufficiently large number of trials, but statistics and [[probability theory]] can provide guidance on two types of question; specifically those of how many trials to undertake and of the accuracy an estimate of the probability of turning up heads, derived from a given sample of trials.
In [[statistics]], the question of '''checking whether a coin is fair''' is one whose importance lies, firstly, in providing a simple problem on which to illustrate basic ideas of [[statistical inference]] and, secondly, in providing a simple problem that can be used to compare various competing methods of statistical inference, including [[decision theory]]. The practical problem of checking whether a coin is fair might be considered as easily solved by performing a sufficiently large number of trials, but statistics and [[probability theory]] can provide guidance on two types of question; specifically those of how many trials to undertake and of the accuracy of an estimate of the probability of turning up heads, derived from a given sample of trials.


A [[fair coin]] is an idealized [[Statistical randomness|randomizing device]] with two states (usually named [[Coin_flipping|"heads" and "tails"]]) which are equally likely to occur. It is based on the [[coin flip]] used widely in sports and other situations where it is required to give two parties the same chance of winning. Either a specially designed [[Casino token|chip]] or more usually a simple currency [[coin]] is used, although the latter might be slightly "unfair" due to an asymmetrical weight distribution, which might cause one state to occur more frequently than the other, giving one party an unfair advantage.<ref>However, if the coin is caught rather than allowed to bounce or spin, it is difficult to bias a coin flip's outcome. See {{cite journal| title=Teacher’s Corner: You Can Load a Die, But You Can’t Bias a Coin| first=Andrew | last= [[Andrew Gelman|Gelman]] | journal=American Statistician | year=2002 | volume=56 |pages=308–311 | doi=10.1198/000313002605 |author2=Deborah Nolan}}</ref> So it might be necessary to test experimentally whether the coin is in fact "fair" &ndash; that is, whether the probability of the coin falling on either side when it is tossed is approximately 50%. It is of course impossible to rule out arbitrarily small deviations from fairness such as might be expected to affect only one flip in a lifetime of flipping; also it is always possible for an unfair (or "[[Systematic bias|biased]]") coin to happen to turn up exactly 10 heads in 20 flips. As such, any fairness test must only establish a certain degree of confidence in a certain degree of fairness (a certain maximum bias). In more rigorous terminology, the problem is of determining the parameters of a [[Bernoulli process]], given only a limited sample of [[Bernoulli trial]]s.
A [[fair coin]] is an idealized [[Statistical randomness|randomizing device]] with two states (usually named [[Coin flipping|"heads" and "tails"]]) which are equally likely to occur. It is based on the [[coin flip]] used widely in sports and other situations where it is required to give two parties the same chance of winning. Either a specially designed [[Casino token|chip]] or more usually a simple currency [[coin]] is used, although the latter might be slightly "unfair" due to an asymmetrical weight distribution, which might cause one state to occur more frequently than the other, giving one party an unfair advantage.<ref>However, if the coin is caught rather than allowed to bounce or spin, it is difficult to bias a coin flip's outcome. See {{cite journal| title=Teacher's Corner: You Can Load a Die, But You Can't Bias a Coin| first=Andrew | last= [[Andrew Gelman|Gelman]] | journal=American Statistician | year=2002 | volume=56 | issue=4 |pages=308–311 | doi=10.1198/000313002605 |author2=Deborah Nolan| s2cid=123597087 }}</ref> So it might be necessary to test experimentally whether the coin is in fact "fair" &ndash; that is, whether the probability of the coin's falling on either side when it is tossed is exactly 50%. It is of course impossible to rule out arbitrarily small deviations from fairness such as might be expected to affect only one flip in a lifetime of flipping; also it is always possible for an unfair (or "[[Systematic bias|biased]]") coin to happen to turn up exactly 10 heads in 20 flips. Therefore, any fairness test must only establish a certain degree of confidence in a certain degree of fairness (a certain maximum bias). In more rigorous terminology, the problem is of determining the parameters of a [[Bernoulli process]], given only a limited sample of [[Bernoulli trial]]s.


== Preamble ==
== Preamble ==
This article describes experimental procedures for determining whether a coin is fair or not fair. There are many statistical methods for analyzing such an experimental procedure. This article illustrates two of them.
This article describes experimental procedures for determining whether a coin is fair or unfair. There are many statistical methods for analyzing such an experimental procedure. This article illustrates two of them.


Both methods prescribe an experiment (or trial) in which the coin is tossed many times and the result of each toss is recorded. The results can then be analysed statistically to decide whether the coin is "fair" or "probably not fair". It is assumed that the number of tosses is fixed and cannot be decided by the experimenter.
Both methods prescribe an experiment (or trial) in which the coin is tossed many times and the result of each toss is recorded. The results can then be analysed statistically to decide whether the coin is "fair" or "probably not fair".


* '''Posterior probability density function''', or PDF ([[Bayesian probability|Bayesian approach]]). The true probability of obtaining a particular side when a fair coin is tossed is unknown, but the uncertainty is initially represented by the "[[prior distribution]]". The theory of [[Bayesian inference]] is used to derive the [[posterior distribution]] by combining the prior distribution and the [[likelihood function]] which represents the information obtained from the experiment. The probability that this particular coin is a "fair coin" can then be obtained by integrating the PDF of the [[posterior distribution]] over the relevant interval that represents all the probabilities that can be counted as "fair" in a practical sense.
* '''Posterior probability density function''', or PDF ([[Bayesian probability|Bayesian approach]]). Initially, the true probability of obtaining a particular side when a coin is tossed is unknown, but the uncertainty is represented by the "[[prior distribution]]". The theory of [[Bayesian inference]] is used to derive the [[posterior distribution]] by combining the prior distribution and the [[likelihood function]] which represents the information obtained from the experiment. The probability that this particular coin is a "fair coin" can then be obtained by integrating the PDF of the [[posterior distribution]] over the relevant interval that represents all the probabilities that can be counted as "fair" in a practical sense.
* '''Estimator of true probability''' ([[Frequency probability|Frequentist approach]]). This method assumes that the experimenter can decide to toss the coin any number of times. The experimenter first decides on the level of confidence required and the tolerable margin of error. These parameters determine the minimum number of tosses that must be performed to complete the experiment.

* '''Estimator of true probability''' ([[Frequency probability|Frequentist approach]]). This method assumes that the experimenter can decide to toss the coin any number of times. He first decides on the level of confidence required and the tolerable margin of error. These parameters determine the minimum number of tosses that must be performed to complete the experiment.


An important difference between these two approaches is that the first approach gives some weight to one's prior experience of tossing coins, while the second does not. The question of how much weight to give to prior experience, depending on the quality (credibility) of that experience, is discussed under [[credibility theory]].
An important difference between these two approaches is that the first approach gives some weight to one's prior experience of tossing coins, while the second does not. The question of how much weight to give to prior experience, depending on the quality (credibility) of that experience, is discussed under [[credibility theory]].


== Posterior probability density function ==
== Posterior probability density function ==

One method is to calculate the posterior [[probability density function]] of [[Bayesian probability theory]].
One method is to calculate the posterior [[probability density function]] of [[Bayesian probability theory]].


A test is performed by tossing the coin ''N'' times and noting the observed numbers of heads, ''h'', and tails, ''t''. The symbols ''H'' and ''T'' represent more generalised variables expressing the numbers of heads and tails respectively that ''might'' have been observed in the experiment. Thus ''N'' = ''H''+''T'' = ''h''+''t''.
A test is performed by tossing the coin ''N'' times and noting the observed numbers of heads, ''h'', and tails, ''t''. The symbols ''H'' and ''T'' represent more generalised variables expressing the numbers of heads and tails respectively that ''might'' have been observed in the experiment. Thus ''N'' = ''H'' + ''T'' = ''h'' + ''t''.


Next, let ''r'' be the actual probability of obtaining heads in a single toss of the coin. This is the property of the coin which is being investigated. Using [[Bayes' theorem]], the posterior probability density of ''r'' conditional on ''h'' and ''t'' is expressed as follows:
Next, let ''r'' be the actual probability of obtaining heads in a single toss of the coin. This is the property of the coin which is being investigated. Using [[Bayes' theorem]], the posterior probability density of ''r'' conditional on ''h'' and ''t'' is expressed as follows:


:<math> f(r | H=h, T=t) =
:<math> f(r \mid H = h, T = t) =
\frac {\Pr(H=h | r, N=h+t) \, g(r)} {\int_0^1 \Pr(H=h |r, N=h+t) \, g(r) \, dr}. \!</math>
\frac{\Pr(H = h \mid r, N = h + t) \, g(r)}{\int_0^1 \Pr(H = h \mid p, N = h + t) \, g(p) \, dp},</math>


where ''g''(''r'') represents the prior probability density distribution of ''r'', which lies in the range 0 to 1.
where ''g''(''r'') represents the prior probability density distribution of ''r'', which lies in the range 0 to 1.


The prior probability density distribution summarizes what is known about the distribution of ''r'' in the absence of any observation. We will assume that the [[prior distribution]] of ''r'' is [[Uniform distribution (continuous)|uniform]] over the interval [0, 1]. That is, ''g''(''r'') = 1. (In practice, it would be more appropriate to assume a prior distribution which is much more heavily weighted in the region around 0.5, to reflect our experience with real coins.)
The prior probability density distribution summarizes what is known about the distribution of ''r'' in the absence of any observation. We will assume that the [[prior distribution]] of ''r'' is [[Uniform distribution (continuous)|uniform]] over the interval [0,&nbsp;1]. That is, ''g''(''r'') = 1. (In practice, it would be more appropriate to assume a prior distribution which is much more heavily weighted in the region around 0.5, to reflect our experience with real coins.)


The probability of obtaining ''h'' heads in ''N'' tosses of a coin with a probability of heads equal to ''r'' is given by the [[binomial distribution]]:
The probability of obtaining ''h'' heads in ''N'' tosses of a coin with a probability of heads equal to ''r'' is given by the [[binomial distribution]]:


:<math> \Pr(H=h | r, N=h+t) = {N \choose h} \, r^h \, (1-r)^t. \!</math>
:<math> \Pr(H = h \mid r, N = h + t) = {N \choose h} r^h (1 - r)^t.</math>


Substituting this into the previous formula:
Substituting this into the previous formula:


:<math>
:<math>
f(r | H=h, T=t)
f(r \mid H = h, T = t)
= \frac{{N \choose h}\,r^h\,(1-r)^t}
= \frac{{N \choose h} r^h (1-r)^t}
{\int_0^1 {N \choose h}\,r^h\,(1-r)^t\,dr}
{\int_0^1 {N \choose h} p^h (1 - p)^t\,dp}
= \frac{r^h\,(1-r)^t}{\int_0^1 r^h\,(1-r)^t\,dr}
= \frac{r^h (1 - r)^t}{\int_0^1 p^h (1 - p)^t\,dp}.
.
</math>
</math>


This is in fact a [[beta distribution]] (the [[conjugate prior]] for the binomial distribution), whose denominator can be expressed in terms of the [[beta function]]:
This is in fact a [[beta distribution]] (the [[conjugate prior]] for the binomial distribution), whose denominator can be expressed in terms of the [[beta function]]:


:<math>f(r | H=h, T=t) = \frac{1}{\mathrm{B}(h+1,t+1)} \; r^h\,(1-r)^t. \!</math>
:<math>f(r \mid H = h, T = t) = \frac{1}{\mathrm{B}(h + 1, t + 1)} r^h (1 - r)^t.</math>


As a uniform prior distribution has been assumed, and because ''h'' and ''t'' are integers, this can also be written in terms of [[factorial]]s:
As a uniform prior distribution has been assumed, and because ''h'' and ''t'' are integers, this can also be written in terms of [[factorial]]s:


:<math>f(r | H=h, T=t) = \frac{(N+1)!}{h!\,\,t!} \; r^h\,(1-r)^t. \!</math>
:<math>f(r \mid H = h, T = t) = \frac{(h + t + 1)!}{h!\,t!} r^h (1 - r)^t.</math>


=== Example ===
=== Example ===

For example, let ''N'' = 10, ''h'' = 7, i.e. the coin is tossed 10 times and 7 heads are obtained:
For example, let ''N'' = 10, ''h'' = 7, i.e. the coin is tossed 10 times and 7 heads are obtained:


:<math> f(r | H=7, T=3) = \frac{(10+1)!}{7!\,\,3!} \; r^7 \, (1-r)^3 = 1320 \, r^7 \, (1-r)^3 \!</math>
:<math> f(r \mid H = 7, T = 3) = \frac{(10 + 1)!}{7!\,3!} r^7 (1 - r)^3 = 1320 \, r^7 (1 - r)^3.</math>


The graph on the right shows the [[probability density function]] of ''r'' given that 7 heads were obtained in 10 tosses. (Note: ''r'' is the probability of obtaining heads when tossing the same coin once.)
The graph on the right shows the [[probability density function]] of ''r'' given that 7 heads were obtained in 10 tosses. (Note: ''r'' is the probability of obtaining heads when tossing the same coin once.)


[[File:Plot_of_1320p7q3at500by420.png|thumb|right|300px|Plot of the probability density ''f''(''x''&nbsp;<nowiki>|</nowiki>&nbsp;''H''&nbsp;= 7,''T''&nbsp;=&nbsp;3) = 1320&nbsp;''x''<sup>7</sup>&nbsp;(1&nbsp;-&nbsp;''x'')<sup>3</sup> with ''x'' ranging from 0 to 1.]]
[[File:Plot_of_1320p7q3at500by420.png|thumb|right|300px|Plot of the probability density ''f''(''r''&nbsp;<nowiki>|</nowiki>&nbsp;''H''&nbsp;= 7,&nbsp;''T''&nbsp;=&nbsp;3) = 1320&nbsp;''r''<sup>7</sup>&nbsp;(1&nbsp;&nbsp;''r'')<sup>3</sup> with ''r'' ranging from 0 to 1]]


The probability for an unbiased coin (defined for this purpose as one whose probability of coming down heads is somewhere between 45% and 55%)
The probability for an unbiased coin (defined for this purpose as one whose probability of coming down heads is somewhere between 45% and 55%)
Line 65: Line 62:
:<math>
:<math>
\Pr(0.45 < r <0.55)
\Pr(0.45 < r <0.55)
= \int_{0.45}^{0.55} f(r | H=7, T=3) \,dr
= \int_{0.45}^{0.55} f(p \mid H = 7, T = 3) \,dp
\approx 13\%
\approx 13\%
\!</math>
\!</math>


is small when compared with the alternative hypothesis (a biased coin). However, it is not small enough to cause us to believe that the coin has a significant bias. Notice that this probability is slightly ''higher'' than our presupposition of the probability that the coin was fair corresponding to the uniform prior distribution, which was 10%.
is small when compared with the alternative hypothesis (a biased coin). However, it is not small enough to cause us to believe that the coin has a significant bias. This probability is slightly ''higher'' than our presupposition of the probability that the coin was fair corresponding to the uniform prior distribution, which was 10%.
Using a prior distribution that reflects our prior knowledge of what a coin is and how it acts, the posterior distribution would not favor the hypothesis of bias. However the number of trials in this example (10 tosses) is very small, and with more trials the choice of prior distribution would be somewhat less relevant.)
Using a prior distribution that reflects our prior knowledge of what a coin is and how it acts, the posterior distribution would not favor the hypothesis of bias. However the number of trials in this example (10 tosses) is very small, and with more trials the choice of prior distribution would be somewhat less relevant.)


Note that, with the uniform prior, the posterior probability distribution ''f''(''r''&nbsp;|&nbsp;''H''&nbsp;= 7,''T''&nbsp;=&nbsp;3) achieves its peak at ''r''&nbsp;=&nbsp;''h''&nbsp;/&nbsp;(''h''&nbsp;+&nbsp;''t'')&nbsp;=&nbsp;0.7; this value is called the [[maximum a posteriori estimation|maximum ''a posteriori'' (MAP) estimate]] of ''r''. Also with the uniform prior, the [[expected value]] of ''r'' under the posterior distribution is
With the uniform prior, the posterior probability distribution ''f''(''r''&nbsp;|&nbsp;''H''&nbsp;= 7,''T''&nbsp;=&nbsp;3) achieves its peak at ''r''&nbsp;=&nbsp;''h''&nbsp;/&nbsp;(''h''&nbsp;+&nbsp;''t'')&nbsp;=&nbsp;0.7; this value is called the [[maximum a posteriori estimation|maximum ''a posteriori'' (MAP) estimate]] of ''r''. Also with the uniform prior, the [[expected value]] of ''r'' under the posterior distribution is
:<math>\operatorname{E}[r] = \int_0^1 r \cdot f(r | H=7, T=3) \, \mathrm{d}r = \frac{h+1}{h+t+2} = \frac{2}{3}\,.</math>
:<math>\operatorname{E}[r] = \int_0^1 r \cdot f(r \mid H=7, T=3) \, \mathrm{d}r = \frac{h+1}{h+t+2} = \frac{2}{3}.</math>


<!-- If the prior distribution were [[beta distribution|Beta(α,&nbsp;β)]], instead of the uniform distribution Beta(α&nbsp;=&nbsp;1,&nbsp;β&nbsp;=&nbsp;1), the posterior distribution would be ''f''(''r''&nbsp;|&nbsp;''h'',''t'')&nbsp;=&nbsp;Beta(α&nbsp;+&nbsp;h,&nbsp;β&nbsp;+&nbsp;t), the MAP for ''r'' would be (α&nbsp;+&nbsp;h&nbsp;−&nbsp;1)&nbsp;/&nbsp;(α&nbsp;+&nbsp;h&nbsp;+&nbsp;β&nbsp;+&nbsp;t&nbsp;−&nbsp;2) when that is sensible, and the posterior expected value for ''r'' would be (α&nbsp;+&nbsp;h)&nbsp;/&nbsp;(α&nbsp;+&nbsp;h&nbsp;+&nbsp;β&nbsp;+&nbsp;t). Observe that the higher that ''h'' and ''t'' are, the less important the choice of α and β is. -->
<!-- If the prior distribution were [[beta distribution|Beta(α,&nbsp;β)]], instead of the uniform distribution Beta(α&nbsp;=&nbsp;1,&nbsp;β&nbsp;=&nbsp;1), the posterior distribution would be ''f''(''r''&nbsp;|&nbsp;''h'',''t'')&nbsp;=&nbsp;Beta(α&nbsp;+&nbsp;h,&nbsp;β&nbsp;+&nbsp;t), the MAP for ''r'' would be (α&nbsp;+&nbsp;h&nbsp;−&nbsp;1)&nbsp;/&nbsp;(α&nbsp;+&nbsp;h&nbsp;+&nbsp;β&nbsp;+&nbsp;t&nbsp;−&nbsp;2) when that is sensible, and the posterior expected value for ''r'' would be (α&nbsp;+&nbsp;h)&nbsp;/&nbsp;(α&nbsp;+&nbsp;h&nbsp;+&nbsp;β&nbsp;+&nbsp;t). Observe that the higher that ''h'' and ''t'' are, the less important the choice of α and β is. -->


== Estimator of true probability ==
== Estimator of true probability ==

{| border="1" cellpadding="5" cellspacing="0" align="center"
{| border="1" cellpadding="5" cellspacing="0" align="center"
|-
|-
|The best estimator for the actual value <math>r\,\!</math> is the estimator <math>p\,\! = \frac{h}{h+t}</math>.
|The best estimator for the actual value <math>r\,\!</math> is the estimator <math>p\,\! = \frac{h}{h+t}</math>.


This estimator has a margin of error (E) where <math>|p - r| < E </math> at a particular confidence level.
This estimator has a margin of error (E) where <math>|p - r| < E </math> at a particular confidence level.
Line 95: Line 91:
{| class="wikitable"
{| class="wikitable"
|-
|-
! align="center" | Z value
! align="center" | Z value
! align="center" | Confidence Level
! align="center" | Confidence level
! align="center" | Comment
! align="center" | Comment
|-
|-
| 0.6745
| 0.6745
| gives '''50.000'''% level of confidence
| gives '''50.000'''% level of confidence
! align="center" | Half
! align="center" | Half
|-
|-
| 1.0000
| 1.0000
| gives '''68.269'''% level of confidence
| gives '''68.269'''% level of confidence
! align="center"| One std dev
! align="center"| One std dev
Line 109: Line 105:
| 1.6449
| 1.6449
| gives '''90.000'''% level of confidence
| gives '''90.000'''% level of confidence
! align="center"| "One Nine"
! align="center"| "One nine"
|-
|-
| 1.9599
| 1.9599
| gives '''95.000'''% level of confidence
| gives '''95.000'''% level of confidence
! align="center"| 95 percent
! align="center"| 95 percent
|-
|-
| 2.0000
| 2.0000
| gives '''95.450'''% level of confidence
| gives '''95.450'''% level of confidence
! align="center"| Two std dev
! align="center"| Two std dev
Line 121: Line 117:
| 2.5759
| 2.5759
| gives '''99.000'''% level of confidence
| gives '''99.000'''% level of confidence
! align="center"| "Two Nines"
! align="center"| "Two nines"
|-
|-
| 3.0000
| 3.0000
Line 129: Line 125:
| 3.2905
| 3.2905
| gives '''99.900'''% level of confidence
| gives '''99.900'''% level of confidence
! align="center"| "Three Nines"
! align="center"| "Three nines"
|-
|-
| 3.8906
| 3.8906
| gives '''99.990'''% level of confidence
| gives '''99.990'''% level of confidence
! align="center"| "Four Nines"
! align="center"| "Four nines"
|-
|-
| 4.0000
| 4.0000
Line 141: Line 137:
| 4.4172
| 4.4172
| gives '''99.999'''% level of confidence
| gives '''99.999'''% level of confidence
! align="center"| "Five Nines"
! align="center"| "Five nines"
|}
|}


*The maximum error (E) is defined by <math>|p - r| < E </math> where <math>p\,\!</math> is the '''estimated probability''' of obtaining heads. Note: <math>r</math> is the same actual probability (of obtaining heads) as <math>r\,\!</math> of the previous section in this article.
*The maximum error (E) is defined by <math>|p - r| < E </math> where <math>p\,\!</math> is the '''estimated probability''' of obtaining heads. Note: <math>r</math> is the same actual probability (of obtaining heads) as <math>r\,\!</math> of the previous section in this article.
*In statistics, the estimate of a proportion of a sample (denoted by ''p'') has a [[standard error]] given by:

*In statistics, the estimate of a proportion of a sample (denoted by ''p'') has a standard error (standard deviation of error) given by:


:<math>s_p = \sqrt{ \frac {p \, (1-p) } {n} }</math>
:<math>s_p = \sqrt{ \frac {p \, (1-p) } {n} }</math>
where ''n'' is the number of trials (which was denoted by ''N'' in the previous paragraph).
where ''n'' is the number of trials (which was denoted by ''N'' in the previous section).


This standard error <math>s_p</math> function of ''p'' has a maximum at <math>p = (1-p) = 0.5</math>. Further, in the case of a coin being tossed, it is likely that ''p'' will be not far from 0.5, so it is reasonable to take ''p''=0.5 in the following:
This standard error <math>s_p</math> function of ''p'' has a maximum at <math>p = (1-p) = 0.5</math>. Further, in the case of a coin being tossed, it is likely that ''p'' will be not far from 0.5, so it is reasonable to take ''p''=0.5 in the following:


{| border="0" cellpadding="0"
:{| border="0" cellpadding="0"
|-
|-
|<math>s_p\,\!</math>
|<math>s_p\,\!</math>
|<math>= \sqrt{ \frac {p \, (1-p) } {n} } = \sqrt{ \frac {0.5 \times 0.5 } {n} } = \frac {1}{2 \, \sqrt{n}}</math>
|<math>= \sqrt{ \frac {p \, (1-p) } {n} } \le \sqrt{ \frac {0.5 \times 0.5 } {n} } = \frac {1}{2 \, \sqrt{n}}</math>
|}
|}


And hence the value of maximum error (E) is given by
And hence the value of maximum error (E) is given by


{| border="0" cellpadding="0"
:{| border="0" cellpadding="0"
|-
|-
|<math>E\,\!</math>
|<math>E= Z \, s_p = \frac {Z}{2 \, \sqrt{n}} </math>
|<math>= Z \, s_p = \frac {Z}{2 \, \sqrt{n}} </math>
|}
|}


Line 194: Line 188:


Now find the value of Z corresponding to 99.999% level of confidence.
Now find the value of Z corresponding to 99.999% level of confidence.

:<math>Z = 4.4172 \,\! </math>
:<math>Z = 4.4172 \,\! </math>


Line 206: Line 200:


:<math> 0.4766 < r < 0.5170 \,\!</math>
:<math> 0.4766 < r < 0.5170 \,\!</math>

Hence, 99.999% of the time, the interval above would contain <math>r\,\!</math> which is the true value of obtaining heads in a single toss.


==Other approaches==
==Other approaches==
Other approaches to the question of checking whether a coin is fair are available using [[decision theory]], whose application would require the formulation of a [[loss function]] or [[utility function]] which describes the consequences of making a given decision. An approach that avoids requiring either a loss function or a prior probability (as in the Bayesian approach) is that of "acceptance sampling".<ref>Cox, D.R., Hinkley, D.V. (1974) ''Theoretical Statistics'' (Example 11.7), Chapman & Hall. ISBN 0-412-12420-3</ref>
Other approaches to the question of checking whether a coin is fair are available using [[decision theory]], whose application would require the formulation of a [[loss function]] or [[utility function]] which describes the consequences of making a given decision. An approach that avoids requiring either a loss function or a prior probability (as in the Bayesian approach) is that of "acceptance sampling".<ref>Cox, D.R., Hinkley, D.V. (1974) ''Theoretical Statistics'' (Example 11.7), Chapman & Hall. {{ISBN|0-412-12420-3}}</ref>


==Other applications==
==Other applications==

The above mathematical analysis for determining if a coin is fair can also be applied to other uses. For example:
The above mathematical analysis for determining if a coin is fair can also be applied to other uses. For example:


* Determining the proportion of defective items for a product subjected to a particular (but well defined) condition. Sometimes a product can be very difficult or expensive to produce. Furthermore, if testing such products will result in their destruction, a minimum number of items should be tested. Using a similar analysis, the probability density function of the product defect rate can be found.
* Determining the proportion of defective items for a product subjected to a particular (but well defined) condition. Sometimes a product can be very difficult or expensive to produce. Furthermore, if testing such products will result in their destruction, a minimum number of items should be tested. Using a similar analysis, the probability density function of the product defect rate can be found.
* Two party polling. If a small random sample poll is taken where there are only two mutually exclusive choices, then this is similar to tossing a single coin multiple times using a possibly biased coin. A similar analysis can therefore be applied to determine the confidence to be ascribed to the actual ratio of votes cast. (If people are allowed to [[Abstention|abstain]] then the analysis must take account of that, and the coin-flip analogy doesn't quite hold.)

* Determining the sex ratio in a large group of an animal species. Provided that a small random sample (i.e. small in comparison with the total population) is taken when performing the random sampling of the population, the analysis is similar to determining the probability of obtaining heads in a coin toss.
* Two party polling. If a small random sample poll is taken where there are only two mutually exclusive choices, then this is similar to tossing a single coin multiple times using a possibly biased coin. A similar analysis can therefore be applied to determine the confidence to be ascribed to the actual ratio of votes cast. (Note that if people are allowed to [[abstain]] then the analysis must take account of that, and the coin-flip analogy doesn't quite hold.)

* Finding the proportion of females in an animal group. Determining the gender ratio in a large group of an animal species. Provided that a small random sample (i.e. small in comparison with the total population) is taken when performing the random sampling of the population, the analysis is similar to determining the probability of obtaining heads in a coin toss.


==See also==
==See also==
Line 228: Line 217:
*[[Estimation theory]]
*[[Estimation theory]]
*[[Inferential statistics]]
*[[Inferential statistics]]
*[[Dice#Loaded_dice]]
*[[Dice#Loaded dice|Loaded dice]]
*[[Margin of error]]
*[[Margin of error]]
*[[Point estimation]]
*[[Point estimation]]
*[[Statistical randomness]]
*[[Statistical randomness]]

{{More footnotes|date=January 2010}}


==References==
==References==
<references/>
<references/>
*Guttman, Wilks, and Hunter: ''Introductory Engineering Statistics'', John Wiley & Sons, Inc. (1971) ISBN 0-471-33770-6
*Guttman, Wilks, and Hunter: ''Introductory Engineering Statistics'', John Wiley & Sons, Inc. (1971) {{ISBN|0-471-33770-6}}
*Devinder Sivia: ''Data Analysis, a Bayesian Tutorial'', Oxford University Press (1996) ISBN 0-19-851889-7
*Devinder Sivia: ''Data Analysis, a Bayesian Tutorial'', Oxford University Press (1996) {{ISBN|0-19-851889-7}}


[[Category:Statistical tests]]
[[Category:Statistical hypothesis testing]]
[[Category:Bayesian inference]]
[[Category:Bayesian inference]]
[[Category:Experiments]]
[[Category:Experiments]]

Latest revision as of 09:22, 10 February 2024

In statistics, the question of checking whether a coin is fair is one whose importance lies, firstly, in providing a simple problem on which to illustrate basic ideas of statistical inference and, secondly, in providing a simple problem that can be used to compare various competing methods of statistical inference, including decision theory. The practical problem of checking whether a coin is fair might be considered as easily solved by performing a sufficiently large number of trials, but statistics and probability theory can provide guidance on two types of question; specifically those of how many trials to undertake and of the accuracy of an estimate of the probability of turning up heads, derived from a given sample of trials.

A fair coin is an idealized randomizing device with two states (usually named "heads" and "tails") which are equally likely to occur. It is based on the coin flip used widely in sports and other situations where it is required to give two parties the same chance of winning. Either a specially designed chip or more usually a simple currency coin is used, although the latter might be slightly "unfair" due to an asymmetrical weight distribution, which might cause one state to occur more frequently than the other, giving one party an unfair advantage.[1] So it might be necessary to test experimentally whether the coin is in fact "fair" – that is, whether the probability of the coin's falling on either side when it is tossed is exactly 50%. It is of course impossible to rule out arbitrarily small deviations from fairness such as might be expected to affect only one flip in a lifetime of flipping; also it is always possible for an unfair (or "biased") coin to happen to turn up exactly 10 heads in 20 flips. Therefore, any fairness test must only establish a certain degree of confidence in a certain degree of fairness (a certain maximum bias). In more rigorous terminology, the problem is of determining the parameters of a Bernoulli process, given only a limited sample of Bernoulli trials.

Preamble

[edit]

This article describes experimental procedures for determining whether a coin is fair or unfair. There are many statistical methods for analyzing such an experimental procedure. This article illustrates two of them.

Both methods prescribe an experiment (or trial) in which the coin is tossed many times and the result of each toss is recorded. The results can then be analysed statistically to decide whether the coin is "fair" or "probably not fair".

  • Posterior probability density function, or PDF (Bayesian approach). Initially, the true probability of obtaining a particular side when a coin is tossed is unknown, but the uncertainty is represented by the "prior distribution". The theory of Bayesian inference is used to derive the posterior distribution by combining the prior distribution and the likelihood function which represents the information obtained from the experiment. The probability that this particular coin is a "fair coin" can then be obtained by integrating the PDF of the posterior distribution over the relevant interval that represents all the probabilities that can be counted as "fair" in a practical sense.
  • Estimator of true probability (Frequentist approach). This method assumes that the experimenter can decide to toss the coin any number of times. The experimenter first decides on the level of confidence required and the tolerable margin of error. These parameters determine the minimum number of tosses that must be performed to complete the experiment.

An important difference between these two approaches is that the first approach gives some weight to one's prior experience of tossing coins, while the second does not. The question of how much weight to give to prior experience, depending on the quality (credibility) of that experience, is discussed under credibility theory.

Posterior probability density function

[edit]

One method is to calculate the posterior probability density function of Bayesian probability theory.

A test is performed by tossing the coin N times and noting the observed numbers of heads, h, and tails, t. The symbols H and T represent more generalised variables expressing the numbers of heads and tails respectively that might have been observed in the experiment. Thus N = H + T = h + t.

Next, let r be the actual probability of obtaining heads in a single toss of the coin. This is the property of the coin which is being investigated. Using Bayes' theorem, the posterior probability density of r conditional on h and t is expressed as follows:

where g(r) represents the prior probability density distribution of r, which lies in the range 0 to 1.

The prior probability density distribution summarizes what is known about the distribution of r in the absence of any observation. We will assume that the prior distribution of r is uniform over the interval [0, 1]. That is, g(r) = 1. (In practice, it would be more appropriate to assume a prior distribution which is much more heavily weighted in the region around 0.5, to reflect our experience with real coins.)

The probability of obtaining h heads in N tosses of a coin with a probability of heads equal to r is given by the binomial distribution:

Substituting this into the previous formula:

This is in fact a beta distribution (the conjugate prior for the binomial distribution), whose denominator can be expressed in terms of the beta function:

As a uniform prior distribution has been assumed, and because h and t are integers, this can also be written in terms of factorials:

Example

[edit]

For example, let N = 10, h = 7, i.e. the coin is tossed 10 times and 7 heads are obtained:

The graph on the right shows the probability density function of r given that 7 heads were obtained in 10 tosses. (Note: r is the probability of obtaining heads when tossing the same coin once.)

Plot of the probability density f(r | H = 7, T = 3) = 1320 r7 (1 − r)3 with r ranging from 0 to 1

The probability for an unbiased coin (defined for this purpose as one whose probability of coming down heads is somewhere between 45% and 55%)

is small when compared with the alternative hypothesis (a biased coin). However, it is not small enough to cause us to believe that the coin has a significant bias. This probability is slightly higher than our presupposition of the probability that the coin was fair corresponding to the uniform prior distribution, which was 10%. Using a prior distribution that reflects our prior knowledge of what a coin is and how it acts, the posterior distribution would not favor the hypothesis of bias. However the number of trials in this example (10 tosses) is very small, and with more trials the choice of prior distribution would be somewhat less relevant.)

With the uniform prior, the posterior probability distribution f(r | H = 7,T = 3) achieves its peak at r = h / (h + t) = 0.7; this value is called the maximum a posteriori (MAP) estimate of r. Also with the uniform prior, the expected value of r under the posterior distribution is


Estimator of true probability

[edit]
The best estimator for the actual value is the estimator .

This estimator has a margin of error (E) where at a particular confidence level.

Using this approach, to decide the number of times the coin should be tossed, two parameters are required:

  1. The confidence level which is denoted by confidence interval (Z)
  2. The maximum (acceptable) error (E)
  • The confidence level is denoted by Z and is given by the Z-value of a standard normal distribution. This value can be read off a standard score statistics table for the normal distribution. Some examples are:
Z value Confidence level Comment
0.6745 gives 50.000% level of confidence Half
1.0000 gives 68.269% level of confidence One std dev
1.6449 gives 90.000% level of confidence "One nine"
1.9599 gives 95.000% level of confidence 95 percent
2.0000 gives 95.450% level of confidence Two std dev
2.5759 gives 99.000% level of confidence "Two nines"
3.0000 gives 99.730% level of confidence Three std dev
3.2905 gives 99.900% level of confidence "Three nines"
3.8906 gives 99.990% level of confidence "Four nines"
4.0000 gives 99.993% level of confidence Four std dev
4.4172 gives 99.999% level of confidence "Five nines"
  • The maximum error (E) is defined by where is the estimated probability of obtaining heads. Note: is the same actual probability (of obtaining heads) as of the previous section in this article.
  • In statistics, the estimate of a proportion of a sample (denoted by p) has a standard error given by:

where n is the number of trials (which was denoted by N in the previous section).

This standard error function of p has a maximum at . Further, in the case of a coin being tossed, it is likely that p will be not far from 0.5, so it is reasonable to take p=0.5 in the following:

And hence the value of maximum error (E) is given by

Solving for the required number of coin tosses, n,

Examples

[edit]

1. If a maximum error of 0.01 is desired, how many times should the coin be tossed?

at 68.27% level of confidence (Z=1)
at 95.45% level of confidence (Z=2)
at 99.90% level of confidence (Z=3.3)

2. If the coin is tossed 10000 times, what is the maximum error of the estimator on the value of (the actual probability of obtaining heads in a coin toss)?

at 68.27% level of confidence (Z=1)
at 95.45% level of confidence (Z=2)
at 99.90% level of confidence (Z=3.3)

3. The coin is tossed 12000 times with a result of 5961 heads (and 6039 tails). What interval does the value of (the true probability of obtaining heads) lie within if a confidence level of 99.999% is desired?

Now find the value of Z corresponding to 99.999% level of confidence.

Now calculate E

The interval which contains r is thus:

Other approaches

[edit]

Other approaches to the question of checking whether a coin is fair are available using decision theory, whose application would require the formulation of a loss function or utility function which describes the consequences of making a given decision. An approach that avoids requiring either a loss function or a prior probability (as in the Bayesian approach) is that of "acceptance sampling".[2]

Other applications

[edit]

The above mathematical analysis for determining if a coin is fair can also be applied to other uses. For example:

  • Determining the proportion of defective items for a product subjected to a particular (but well defined) condition. Sometimes a product can be very difficult or expensive to produce. Furthermore, if testing such products will result in their destruction, a minimum number of items should be tested. Using a similar analysis, the probability density function of the product defect rate can be found.
  • Two party polling. If a small random sample poll is taken where there are only two mutually exclusive choices, then this is similar to tossing a single coin multiple times using a possibly biased coin. A similar analysis can therefore be applied to determine the confidence to be ascribed to the actual ratio of votes cast. (If people are allowed to abstain then the analysis must take account of that, and the coin-flip analogy doesn't quite hold.)
  • Determining the sex ratio in a large group of an animal species. Provided that a small random sample (i.e. small in comparison with the total population) is taken when performing the random sampling of the population, the analysis is similar to determining the probability of obtaining heads in a coin toss.

See also

[edit]

References

[edit]
  1. ^ However, if the coin is caught rather than allowed to bounce or spin, it is difficult to bias a coin flip's outcome. See Gelman, Andrew; Deborah Nolan (2002). "Teacher's Corner: You Can Load a Die, But You Can't Bias a Coin". American Statistician. 56 (4): 308–311. doi:10.1198/000313002605. S2CID 123597087.
  2. ^ Cox, D.R., Hinkley, D.V. (1974) Theoretical Statistics (Example 11.7), Chapman & Hall. ISBN 0-412-12420-3
  • Guttman, Wilks, and Hunter: Introductory Engineering Statistics, John Wiley & Sons, Inc. (1971) ISBN 0-471-33770-6
  • Devinder Sivia: Data Analysis, a Bayesian Tutorial, Oxford University Press (1996) ISBN 0-19-851889-7