Fisher's exact test: Difference between revisions
m Open access bot: doi added to citation with #oabot. |
→Controversies: style |
||
(29 intermediate revisions by 19 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Statistical significance test}} |
|||
{{Use dmy dates|date= |
{{Use dmy dates|date=August 2021}} |
||
'''Fisher's exact test''' is a [[statistical significance]] test used in the analysis of [[contingency table]]s.<ref>{{Cite journal| last=Fisher | first=R. A. | author-link= Ronald Fisher | year=1922 | title=On the interpretation of χ<sup>2</sup> from contingency tables, and the calculation of P |journal=[[Journal of the Royal Statistical Society]] | volume=85 | issue=1 | pages=87–94 | doi=10.2307/2340521| jstor=2340521| url=https://zenodo.org/record/1449484 }}</ref><ref>{{Cite book| last1=Fisher | first1=R.A. | year= 1954 | title=Statistical Methods for Research Workers | publisher=Oliver and Boyd| isbn=0-05-002170-2| title-link=Statistical Methods for Research Workers }}</ref><ref>{{Cite journal| last=Agresti | first=Alan | year=1992 | title=A Survey of Exact Inference for Contingency Tables |journal =Statistical Science | volume=7 | number=1 | pages=131–153 | doi=10.1214/ss/1177011454 | jstor = 2246001| citeseerx=10.1.1.296.874 }}</ref> Although in practice it is employed when [[sample (statistics)|sample]] sizes are small, it is valid for all sample sizes. It is named after its inventor, [[Ronald Fisher]], and is one of a class of [[exact test]]s, so called because the significance of the deviation from a [[null hypothesis]] (e.g., [[ |
'''Fisher's exact test''' is a [[statistical significance]] test used in the analysis of [[contingency table]]s.<ref>{{Cite journal| last=Fisher | first=R. A. | author-link= Ronald Fisher | year=1922 | title=On the interpretation of χ<sup>2</sup> from contingency tables, and the calculation of P |journal=[[Journal of the Royal Statistical Society]] | volume=85 | issue=1 | pages=87–94 | doi=10.2307/2340521| jstor=2340521| url=https://zenodo.org/record/1449484 }}</ref><ref>{{Cite book| last1=Fisher | first1=R.A. | year= 1954 | title=Statistical Methods for Research Workers | publisher=Oliver and Boyd| isbn=0-05-002170-2| title-link=Statistical Methods for Research Workers }}</ref><ref>{{Cite journal| last=Agresti | first=Alan | year=1992 | title=A Survey of Exact Inference for Contingency Tables |journal =Statistical Science | volume=7 | number=1 | pages=131–153 | doi=10.1214/ss/1177011454 | jstor = 2246001| citeseerx=10.1.1.296.874 }}</ref> Although in practice it is employed when [[sample (statistics)|sample]] sizes are small, it is valid for all sample sizes. It is named after its inventor, [[Ronald Fisher]], and is one of a class of [[exact test]]s, so called because the significance of the deviation from a [[null hypothesis]] (e.g., [[p-value|''p''-value]]) can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity, as with many statistical tests. |
||
Fisher is said to have devised the test following a comment from [[Muriel Bristol]], who claimed to be able to detect whether the tea or the milk was added first to her cup. He tested her claim in the "[[lady tasting tea]]" experiment.<ref name=newman>{{Cite book |
Fisher is said to have devised the test following a comment from [[Muriel Bristol]], who claimed to be able to detect whether the tea or the milk was added first to her cup. He tested her claim in the "[[lady tasting tea]]" experiment.<ref name=newman>{{Cite book |
||
Line 11: | Line 12: | ||
|title=The World of Mathematics, volume 3 |
|title=The World of Mathematics, volume 3 |
||
|editor=James Roy Newman |
|editor=James Roy Newman |
||
|chapter-url=https://books.google.com/?id=oKZwtLQTmNAC& |
|chapter-url=https://books.google.com/books?id=oKZwtLQTmNAC&q=%22mathematics+of+a+lady+tasting+tea%22&pg=PA1512 |
||
|publisher=Courier Dover Publications |
|publisher=Courier Dover Publications |
||
|isbn=978-0-486-41151-4 |
|isbn=978-0-486-41151-4 |
||
Line 18: | Line 19: | ||
==Purpose and scope== |
==Purpose and scope== |
||
[[File:Nice Cup of Tea.jpg|thumb|A [[teapot]], a [[Creamer (vessel)|creamer]] and [[teacup]] full of tea with [[milk]]—can a taster tell if the milk went in first?]] |
[[File:Nice Cup of Tea.jpg|thumb|A [[teapot]], a [[Creamer (vessel)|creamer]] and [[teacup]] full of tea with [[milk]]—can a taster tell if the milk went in first?]] |
||
The test is useful for [[categorical data]] that result from classifying objects in two different ways; it is used to examine the significance of the association (contingency) between the two kinds of classification. So in Fisher's original example, one criterion of classification could be whether milk or tea was put in the cup first; the other could be whether Bristol thinks that the milk or tea was put in first. We want to know whether these two classifications are associated—that is, whether Bristol really can tell whether milk or tea was poured in first. Most uses of the Fisher test involve, like this example, a 2 × 2 contingency table. The [[p-value]] from the test is computed as if the margins of the table are fixed, i.e. as if, in the tea-tasting example, Bristol knows the number of cups with each treatment (milk or tea first) and will therefore provide guesses with the correct number in each category. As pointed out by Fisher, this leads under a null hypothesis of independence to a [[hypergeometric distribution]] of the numbers in the cells of the table. |
The test is useful for [[categorical data]] that result from classifying objects in two different ways; it is used to examine the significance of the association (contingency) between the two kinds of classification. So in Fisher's original example, one criterion of classification could be whether milk or tea was put in the cup first; the other could be whether Bristol thinks that the milk or tea was put in first. We want to know whether these two classifications are associated—that is, whether Bristol really can tell whether milk or tea was poured in first. Most uses of the Fisher test involve, like this example, a 2 × 2 contingency table (discussed below). The [[p-value|''p''-value]] from the test is computed as if the margins of the table are fixed, i.e. as if, in the tea-tasting example, Bristol knows the number of cups with each treatment (milk or tea first) and will therefore provide guesses with the correct number in each category. As pointed out by Fisher, this leads under a null hypothesis of independence to a [[hypergeometric distribution]] of the numbers in the cells of the table. |
||
With large samples, a [[chi-squared test]] (or better yet, a [[G-test]]) can be used in this situation. However, the significance value it provides is only an approximation, because the [[sampling distribution]] of the test statistic that is calculated is only approximately equal to the theoretical chi-squared distribution. The approximation is |
With large samples, a [[chi-squared test]] (or better yet, a [[G-test]]) can be used in this situation. However, the significance value it provides is only an approximation, because the [[sampling distribution]] of the test statistic that is calculated is only approximately equal to the theoretical chi-squared distribution. The approximation is poor when sample sizes are small, or the data are very unequally distributed among the cells of the table, resulting in the cell counts predicted on the null hypothesis (the "expected values") being low. The usual rule for deciding whether the chi-squared approximation is good enough is that the chi-squared test is not suitable when the expected values in any of the cells of a contingency table are below 5, or below 10 when there is only one [[degrees of freedom (statistics)|degree of freedom]] (this rule is now known to be overly conservative<ref name="Larntz1978">{{Cite journal |
||
| doi = 10.2307/2286650 |
| doi = 10.2307/2286650 |
||
| last = Larntz |
| last = Larntz |
||
Line 32: | Line 33: | ||
| jstor = 2286650 |
| jstor = 2286650 |
||
}}</ref>). In fact, for small, sparse, or unbalanced data, the exact and asymptotic ''p''-values can be quite different and may lead to opposite conclusions concerning the hypothesis of interest.<ref name="Mehta1984">{{Cite journal |
}}</ref>). In fact, for small, sparse, or unbalanced data, the exact and asymptotic ''p''-values can be quite different and may lead to opposite conclusions concerning the hypothesis of interest.<ref name="Mehta1984">{{Cite journal |
||
| |
| last1 = Mehta |
||
| |
| first1 = Cyrus R |
||
| last2 = Patel |
| last2 = Patel |
||
| first2 = Nitin R |
| first2 = Nitin R |
||
Line 49: | Line 50: | ||
}}</ref><ref name="Mehta1995">Mehta, C. R. 1995. SPSS 6.1 Exact test for Windows. Englewood Cliffs, NJ: Prentice Hall.</ref> In contrast the Fisher exact test is, as its name states, exact as long as the experimental procedure keeps the row and column totals fixed, and it can therefore be used regardless of the sample characteristics. It becomes difficult to calculate with large samples or well-balanced tables, but fortunately these are exactly the conditions where the chi-squared test is appropriate. |
}}</ref><ref name="Mehta1995">Mehta, C. R. 1995. SPSS 6.1 Exact test for Windows. Englewood Cliffs, NJ: Prentice Hall.</ref> In contrast the Fisher exact test is, as its name states, exact as long as the experimental procedure keeps the row and column totals fixed, and it can therefore be used regardless of the sample characteristics. It becomes difficult to calculate with large samples or well-balanced tables, but fortunately these are exactly the conditions where the chi-squared test is appropriate. |
||
For hand calculations, the test is |
For hand calculations, the test is feasible only in the case of a 2 × 2 contingency table. However the principle of the test can be extended to the general case of an ''m'' × ''n'' table,<ref>{{cite journal |author1=Mehta C.R. |author2=Patel N.R. | year = 1983 | title = A Network Algorithm for Performing Fisher's Exact Test in ''r ''X''c'' Contingency Tables | journal = Journal of the American Statistical Association | volume = 78 | issue = 382| pages = 427–434 | doi = 10.2307/2288652 |jstor=2288652 }}</ref><ref>[http://mathworld.wolfram.com/FishersExactTest.html mathworld.wolfram.com] Page giving the formula for the general form of Fisher's exact test for ''m'' × ''n'' contingency tables</ref> and some [[statistical packages]] provide a calculation (sometimes using a [[Monte Carlo method]] to obtain an approximation) for the more general case.<ref>{{cite journal|author1=Cyrus R. Mehta |author2=Nitin R. Patel | title= ALGORITHM 643: FEXACT: a FORTRAN subroutine for Fisher's exact test on unordered r×c contingency tables|journal= ACM Trans. Math. Softw. |volume=12| issue= 2 |year=1986| pages=154–161|doi=10.1145/6497.214326|s2cid=207666979 |doi-access=free}}</ref> |
||
The test can also be used to quantify the ''overlap'' between two sets. |
|||
For example, in enrichment analyses in statistical genetics one set of genes may be annotated for a given phenotype and the user may be interested in testing the overlap of their own set with those. |
|||
In this case a 2 × 2 contingency table may be generated and Fisher's exact test applied through identifying |
|||
# Genes that are provided in both lists |
|||
# Genes that are provided in the first list and not the second |
|||
# Genes that are provided in the second list and not the first |
|||
# Genes that are not provided in either list |
|||
The test assumes genes in either list are taken from a broader set of genes (e.g. all remaining genes). |
|||
A ''p''-value may then be calculated, summarizing the significance of the overlap between the two lists.<ref>{{cite journal |doi=10.1038/nprot.2013.092|title=Large-scale gene function analysis with the PANTHER classification system |year=2013 |last1=Mi |first1=Huaiyu |last2=Muruganujan |first2=Anushya |last3=Casagrande |first3=John T. |last4=Thomas |first4=Paul D. |journal=Nature Protocols |volume=8 |issue=8 |pages=1551–1566 |pmid=23868073 |pmc=6519453 }}</ref> |
|||
== Derivation == |
|||
<ref>[https://galton.uchicago.edu/~yibi/teaching/stat226/2022/L07.pdf STAT 226: Lecture 7, Section 2.6, Fisher’s Exact Tests.] Yibi Huang, University of Chicago</ref> |
|||
{| class="wikitable" style="text-align:center;" |
|||
|- |
|||
! |
|||
! Class I |
|||
! Class II |
|||
|''Row Total'' |
|||
|- |
|||
! scope="row" | Blue |
|||
| bgcolor="lightgray" | '''''a''''' || bgcolor="lightgray" | '''''b''''' || ''a + b'' |
|||
|- |
|||
! scope="row" | Red |
|||
| bgcolor="lightgray" | '''''c''''' || bgcolor="lightgray" | '''''d''''' || ''c + d'' |
|||
|- |
|||
| ''Column Total'' |
|||
| ''a + c'' || ''b + d'' || ''a + b + c + d (=n)'' |
|||
|} |
|||
{{Math proof|title=Derivation|proof= |
|||
We set up the following probability model underlying Fisher’s exact test. |
|||
Suppose we have <math display="inline">a+b</math> blue balls, and <math display="inline">c+d</math> red balls. We throw them together into a black box, shake well, then remove them one by one until we have pulled out exactly <math display="inline">a+c</math> balls. We call these balls “class I” and the <math display="inline">b+d</math> remaining balls “class II”. |
|||
The question is to calculate the probability that exactly <math display="inline">a</math> blue balls are in class I. Every other entry in the table is fixed once we fill in one entry of the table. |
|||
Suppose we pretend that every ball is labelled, and before we start pulling out the balls, we permutate them uniformly randomly, then pull out the first <math display="inline">a+c</math> balls. This gives us <math display="inline">n!</math> possibilities. |
|||
Of these possibilities, we condition on the case where the first <math display="inline">a+c</math> balls contain exactly <math display="inline">a</math> blue balls. To count these possibilities, we do the following: first select uniformly at random a subset of size <math display="inline">a</math> among the <math display="inline">a+c</math> class-I balls with <math display="inline">\binom{a+c}{a}</math> possibilities, then select uniformly at random a subset of size <math display="inline">b</math> among the <math display="inline">b+d</math> class-II balls with <math display="inline">\binom{b+d}{b}</math> possibilities. |
|||
The two selected sets would be filled with blue balls. The rest would be filled with red balls. |
|||
Once we have selected the sets, we can populate them with an arbitrary ordering of the <math display="inline">a+b</math> blue balls. This gives us <math display="inline">(a+b)!</math> possibilities. Same for the red balls, with <math display="inline">(c+d)!</math> possibilities. |
|||
In full, we have <math display="block">\binom{a+c}{a}\binom{b+d}{b}(a+b)!(c+d)!</math> possibilities. |
|||
Thus the probability of this event is <math display="block">\frac{\binom{a+c}{a}\binom{b+d}{b}(a+b)!(c+d)!}{n!}=\frac{\binom{a+c}{a}\binom{b+d}{b}}{\binom{n}{a+b}}</math> |
|||
}} |
|||
Another derivation: |
|||
{{Math proof|title=Derivation|proof= |
|||
Suppose each blue ball and red ball has an equal and independent probability <math display="inline">p</math> of being in class I, and <math display="inline">1-p</math> of being in class II. Then the number of class-I blue balls is binomially distributed. The probability there are exactly <math display="inline">a</math> of them is <math display="inline">\binom{a+b}{a}p^a(1-p)^b</math>, and the probability there are exactly <math display="inline">c</math> of red class I balls is <math display="inline">\binom{c+d}{c}p^c(1-p)^d</math>. |
|||
The probability that there are precisely <math display="inline">a+c</math> of class I balls, regardless of number of red or blue balls in it, is <math display="inline">\binom{n}{a+c}p^{a+c}(1-p)^{b+d}</math>. |
|||
Thus, conditional on having <math display="inline">a+c</math> class I balls, the conditional probability of having a table as shown is <math display="block">\frac{\binom{a+c}{a}\binom{b+d}{b}}{\binom{n}{a+b}}</math> |
|||
}} |
|||
==Example== |
==Example== |
||
For example, a sample of teenagers might be divided into male and female on |
For example, a sample of teenagers might be divided into male and female on one hand and those who are and are not currently studying for a statistics exam on the other. For example, we hypothesize that the proportion of studying students is higher among the women than among the men, and we want to test whether any difference in proportions that we observe is significant. |
||
The data might look like this: |
|||
{|class="wikitable" style="text-align:center;" |
{|class="wikitable" style="text-align:center;" |
||
Line 71: | Line 135: | ||
|} |
|} |
||
The question we ask about these data is: |
The question we ask about these data is: Knowing that 10 of these 24 teenagers are studying and that 12 of the 24 are female, and assuming the null hypothesis that men and women are equally likely to study, what is the probability that these 10 teenagers who are studying would be so unevenly distributed between the women and the men? If we were to choose 10 of the teenagers at random, what is the probability that 9 or more of them would be among the 12 women and only 1 or fewer from among the 12 men? |
||
=== First example === |
|||
Before we proceed with the Fisher test, we first introduce some notations. We represent the cells by the letters ''a, b, c'' and ''d'', call the totals across rows and columns ''marginal totals'', and represent the grand total by ''n''. So the table now looks like this: |
Before we proceed with the Fisher test, we first introduce some notations. We represent the cells by the letters ''a, b, c'' and ''d'', call the totals across rows and columns ''marginal totals'', and represent the grand total by ''n''. So the table now looks like this: |
||
Line 92: | Line 157: | ||
|} |
|} |
||
Fisher showed that conditional on the margins of the table, ''a'' is distributed as |
Fisher showed that conditional on the margins of the table, ''a'' is distributed as a [[hypergeometric distribution]] with ''a+c'' draws from a population with ''a+b'' successes and ''c+d'' failures. The probability of obtaining such set of values is given by: |
||
<center> |
<div class="center"> |
||
<math>p = \frac{ \displaystyle{{a+b}\choose{a}} \displaystyle{{c+d}\choose{c}} }{ \displaystyle{{n}\choose{a+c}} } = \frac{ \displaystyle{{a+b}\choose{b}} \displaystyle{{c+d}\choose{d}} }{ \displaystyle{{n}\choose{b+d}} } = \frac{(a+b)!~(c+d)!~(a+c)!~(b+d)!}{a!~~b!~~c!~~d!~~n!}</math> |
<math>p = \frac{ \displaystyle{{a+b}\choose{a}} \displaystyle{{c+d}\choose{c}} }{ \displaystyle{{n}\choose{a+c}} } = \frac{ \displaystyle{{a+b}\choose{b}} \displaystyle{{c+d}\choose{d}} }{ \displaystyle{{n}\choose{b+d}} } = \frac{(a+b)!~(c+d)!~(a+c)!~(b+d)!}{a!~~b!~~c!~~d!~~n!}</math> |
||
</ |
</div> |
||
where <math> \tbinom nk </math> is the [[binomial coefficient]] and the symbol ! indicates the [[factorial|factorial operator]]. |
where <math> \tbinom nk </math> is the [[binomial coefficient]] and the symbol ! indicates the [[factorial|factorial operator]]. |
||
Line 104: | Line 169: | ||
With the data above (using the first of the equivalent forms), this gives: |
With the data above (using the first of the equivalent forms), this gives: |
||
<center> |
<div class="center"> |
||
<math>p = { {\tbinom{10}{1}} {\tbinom{14}{11}} }/{ {\tbinom{24}{12}} } = \tfrac{10!~14!~12!~12!}{1!~9!~11!~3!~24!} \approx 0.001346076</math> |
<math>p = { {\tbinom{10}{1}} {\tbinom{14}{11}} }/{ {\tbinom{24}{12}} } = \tfrac{10!~14!~12!~12!}{1!~9!~11!~3!~24!} \approx 0.001346076</math> |
||
</ |
</div> |
||
=== Second example === |
|||
The formula above gives the exact hypergeometric probability of observing this particular arrangement of the data, assuming the given marginal totals, on the [[null hypothesis]] that men and women are equally likely to be studiers. To put it another way, if we assume that the probability that a man is a studier is <math>\mathfrak{p}</math>, the probability that a woman is a studier is also <math>\mathfrak{p}</math>, and we assume that both men and women enter our sample independently of whether or not they are studiers, then this hypergeometric formula gives the conditional probability of observing the values ''a, b, c, d'' in the four cells, conditionally on the observed marginals (i.e., assuming the row and column totals shown in the margins of the table are given). This remains true even if men enter our sample with different probabilities than women. The requirement is merely that the two classification characteristics—gender, and studier (or not)—are not associated. |
The formula above gives the exact hypergeometric probability of observing this particular arrangement of the data, assuming the given marginal totals, on the [[null hypothesis]] that men and women are equally likely to be studiers. To put it another way, if we assume that the probability that a man is a studier is <math>\mathfrak{p}</math>, the probability that a woman is a studier is also <math>\mathfrak{p}</math>, and we assume that both men and women enter our sample independently of whether or not they are studiers, then this hypergeometric formula gives the conditional probability of observing the values ''a, b, c, d'' in the four cells, conditionally on the observed marginals (i.e., assuming the row and column totals shown in the margins of the table are given). This remains true even if men enter our sample with different probabilities than women. The requirement is merely that the two classification characteristics—gender, and studier (or not)—are not associated. |
||
Line 131: | Line 197: | ||
<math>{p = {\tbinom{10}{0}} {\tbinom{14}{12}} }/{ {\tbinom{24}{12}} } \approx 0.000033652</math>. |
<math>{p = {\tbinom{10}{0}} {\tbinom{14}{12}} }/{ {\tbinom{24}{12}} } \approx 0.000033652</math>. |
||
=== p-value tests === |
|||
In order to calculate the significance of the observed data, i.e. the total probability of observing data as extreme or more extreme if the [[null hypothesis]] is true, we have to calculate the values of ''p'' for both these tables, and add them together. This gives a [[one-tailed test]], with ''p'' approximately 0.001346076 + 0.000033652 = 0.001379728. For example, in the [[R programming language|R statistical computing environment]], this value can be obtained as <code>fisher.test(rbind(c(1,9),c(11,3)), alternative="less")$p.value</code>. This value can be interpreted as the sum of evidence provided by the observed data—or any more extreme table—for the [[null hypothesis]] (that there is no difference in the proportions of studiers between men and women). The smaller the value of ''p'', the greater the evidence for rejecting the null hypothesis; so here the evidence is strong that men and women are not equally likely to be studiers. |
In order to calculate the significance of the observed data, i.e. the total probability of observing data as extreme or more extreme if the [[null hypothesis]] is true, we have to calculate the values of ''p'' for both these tables, and add them together. This gives a [[one-tailed test]], with ''p'' approximately 0.001346076 + 0.000033652 = 0.001379728. For example, in the [[R programming language|R statistical computing environment]], this value can be obtained as <code>fisher.test(rbind(c(1,9),c(11,3)), alternative="less")$p.value</code>, or in Python, using <code>scipy.stats.fisher_exact(table=[[1,9],[11,3]], alternative="less")</code> (where one receives both the prior odds ratio and the ''p''-value). This value can be interpreted as the sum of evidence provided by the observed data—or any more extreme table—for the [[null hypothesis]] (that there is no difference in the proportions of studiers between men and women). The smaller the value of ''p'', the greater the evidence for rejecting the null hypothesis; so here the evidence is strong that men and women are not equally likely to be studiers. |
||
For a [[two-tailed test]] we must also consider tables that are equally extreme, but in the opposite direction. Unfortunately, classification of the tables according to whether or not they are 'as extreme' is problematic. An approach used by the <code>fisher.test</code> function in |
For a [[two-tailed test]] we must also consider tables that are equally extreme, but in the opposite direction. Unfortunately, classification of the tables according to whether or not they are 'as extreme' is problematic. An approach used by the <code>fisher.test</code> function in [[R programming language|R]] is to compute the ''p''-value by summing the probabilities for all tables with probabilities less than or equal to that of the observed table. In the example here, the 2-sided ''p''-value is twice the 1-sided value—but in general these can differ substantially for tables with small counts, unlike the case with test statistics that have a symmetric sampling distribution. |
||
⚫ | |||
==Controversies== |
==Controversies== |
||
Fisher's test gives exact ''p''-values, but some authors have argued that it is conservative, i.e. that its actual rejection rate is below the nominal significance level.<ref name="Liddell-1976">{{Cite journal |
|||
| doi = 10.2307/2988087 |
| doi = 10.2307/2988087 |
||
| last = Liddell |
| last = Liddell |
||
Line 154: | Line 219: | ||
| year = 1978 |
| year = 1978 |
||
| title = In dispraise of the exact test |
| title = In dispraise of the exact test |
||
| journal = Journal of |
| journal = Journal of Statistical Planning and Inference |
||
| volume = 2 |
| volume = 2 |
||
| pages = 27–42 |
| pages = 27–42 |
||
Line 177: | Line 242: | ||
| pages = 426–463 |
| pages = 426–463 |
||
| jstor = 2981577 |
| jstor = 2981577 |
||
| s2cid = 15760519 |
|||
}}</ref><ref name="Little1989">{{Cite journal |
}}</ref><ref name="Little1989">{{Cite journal |
||
| doi = 10.2307/2685390 |
| doi = 10.2307/2685390 |
||
| author = Little, Roderick J. A. |
| author = Little, Roderick J. A. |
||
Line 187: | Line 253: | ||
| pages = 283–288 |
| pages = 283–288 |
||
| jstor = 2685390 |
| jstor = 2685390 |
||
}}</ref> |
}}</ref> Consider the following proposal for a significance test at the 5%-level: reject the null hypothesis for each table to which Fisher's test assigns a ''p''-value equal to or smaller than 5%. Because the set of all tables is discrete, there may not be a table for which equality is achieved. If <math>\alpha_e</math> is the largest ''p''-value smaller than 5% which can actually occur for some table, then the proposed test effectively tests at the <math>\alpha_e</math>-level. For small sample sizes, <math>\alpha_e</math> might be significantly lower than 5%.<ref name="Liddell-1976" /><ref name="Berkson1978" /><ref name="DAgostino1988" /> While this effect occurs for any discrete statistic (not just in contingency tables, or for Fisher's test), it has been argued that the problem is compounded by the fact that Fisher's test conditions on the marginals.<ref>{{cite web |first1=Cyrus R. |last1=Mehta |first2=Pralay |last2=Senchaudhuri |date=4 September 2003 |url=https://www.statsols.com/hubfs/Resources_/Comparing-Two-Binomials.pdf |title=Conditional versus unconditional exact tests for comparing two binomials |access-date=20 November 2009}}</ref> To avoid the problem, many authors discourage the use of fixed significance levels when dealing with discrete problems.<ref name="Yates1984" /><ref name="Little1989" /> |
||
The decision to condition on the margins of the table is also controversial.<ref name="Barnard1945a"> |
The decision to condition on the margins of the table is also controversial.<ref name="Barnard1945a"> |
||
Line 199: | Line 265: | ||
|page=177 |
|page=177 |
||
|issue=3954 |
|issue=3954 |
||
|bibcode=1945Natur.156..177B |
|||
|doi-access=free |
|doi-access=free |
||
}}</ref><ref name="NatureDiscussion"> |
}}</ref><ref name="NatureDiscussion"> |
||
Line 210: | Line 277: | ||
|title=A New Test for 2 × 2 Tables |
|title=A New Test for 2 × 2 Tables |
||
|issue=3961 |
|issue=3961 |
||
|bibcode=1945Natur.156..388F |
|||
|s2cid=4113420 |
|||
|doi-access=free |
|||
}}; |
}}; |
||
{{Cite journal |
{{Cite journal |
||
Line 220: | Line 290: | ||
|doi=10.1038/156783b0 |
|doi=10.1038/156783b0 |
||
|issue=3974 |
|issue=3974 |
||
|bibcode=1945Natur.156..783B |
|||
|s2cid=4099311 |
|||
}} |
}} |
||
</ref> The p-values derived from Fisher's test come from the distribution that conditions on the margin totals. In this sense, the test is exact only for the conditional distribution and not the original table where the margin totals may change from experiment to experiment. It is possible to obtain an exact p-value for the 2×2 table when the margins are not held fixed. [[Barnard's exact test|Barnard's test]], for example, allows for random margins. However, some authors<ref name="Yates1984" /><ref name="Little1989" /><ref name="NatureDiscussion" /> (including, later, Barnard himself)<ref name="Yates1984" /> have criticized Barnard's test based on this property. They argue that the marginal success total is an (almost<ref name="Little1989" />) [[ancillary statistic]], containing (almost) no information about the tested property. |
</ref> The ''p''-values derived from Fisher's test come from the distribution that conditions on the margin totals. In this sense, the test is exact only for the conditional distribution and not the original table where the margin totals may change from experiment to experiment. It is possible to obtain an exact ''p''-value for the 2×2 table when the margins are not held fixed. [[Barnard's exact test|Barnard's test]], for example, allows for random margins. However, some authors<ref name="Yates1984" /><ref name="Little1989" /><ref name="NatureDiscussion" /> (including, later, Barnard himself)<ref name="Yates1984" /> have criticized Barnard's test based on this property. They argue that the marginal success total is an (almost<ref name="Little1989" />) [[ancillary statistic]], containing (almost) no information about the tested property. |
||
The act of conditioning on the marginal success rate from a 2×2 table can be shown to ignore some information in the data about the unknown odds ratio.<ref name="Choi2015"> |
The act of conditioning on the marginal success rate from a 2×2 table can be shown to ignore some information in the data about the unknown odds ratio.<ref name="Choi2015"> |
||
Line 228: | Line 300: | ||
|year=2015 |
|year=2015 |
||
|title=Elucidating the foundations of statistical inference with 2×2 tables |
|title=Elucidating the foundations of statistical inference with 2×2 tables |
||
|journal= |
|journal=PLOS ONE |
||
|volume=10 |
|volume=10 |
||
|issue=4 |
|issue=4 |
||
Line 235: | Line 307: | ||
|pmc=4388855 |
|pmc=4388855 |
||
|pmid=25849515 |
|pmid=25849515 |
||
|bibcode=2015PLoSO..1021263C |
|||
⚫ | }}</ref> The argument that the marginal totals are (almost) ancillary implies that the appropriate likelihood function for making inferences about this odds ratio should be conditioned on the marginal success rate.<ref name="Choi2015" /> |
||
|doi-access=free |
|||
⚫ | }}</ref> The argument that the marginal totals are (almost) ancillary implies that the appropriate likelihood function for making inferences about this odds ratio should be conditioned on the marginal success rate.<ref name="Choi2015" /> Whether this lost information is important for inferential purposes is the essence of the controversy.<ref name="Choi2015" /> |
||
==Alternatives== |
==Alternatives== |
||
An alternative exact test, [[Barnard's exact test]], has been developed and proponents{{ |
An alternative exact test, [[Barnard's exact test]], has been developed and proponents<ref>{{cite journal | author = Lydersen, S., Fagerland, M. W., and Laake, P. | year = 2009 | title = Recommended tests for association in 2× 2 tables | journal = Statistics in Medicine | volume = 28 | issue = 7 | pages = 1159–1175 | doi= 10.1002/sim.3531| pmid = 19170020 | s2cid = 3900997 }}</ref> of it suggest that this method is more powerful, particularly in 2×2 tables.<ref>{{cite journal | author = Berger R.L. | year = 1994 | title = Power comparison of exact unconditional tests for comparing two binomial proportions | journal = Institute of Statistics Mimeo Series No. 2266 | pages = 1–19 }}</ref> Furthermore, [[Boschloo's test]] is an exact test that is uniformly more powerful than Fisher's exact test by construction.<ref name="Boschloo">{{cite journal | author = Boschloo R.D. | year = 1970 | title = Raised Conditional Level of Significance for the ''2''x''2''-table when Testing the Equality of Two Probabilities | journal = Statistica Neerlandica | volume = 24 | pages = 1–35 | doi = 10.1111/j.1467-9574.1970.tb00104.x}}</ref> |
||
⚫ | Most modern [[statistical package]]s will calculate the significance of Fisher tests, in some cases even where the chi-squared approximation would also be acceptable. The actual computations as performed by statistical software packages will as a rule differ from those described above, because numerical difficulties may result from the large values taken by the factorials. A simple, somewhat better computational approach relies on a [[gamma function]] or log-gamma function, but methods for accurate computation of hypergeometric and binomial probabilities remains an active research area. |
||
For stratified categorical data the [[Cochran–Mantel–Haenszel test]] must be used instead of Fisher's test. |
For stratified categorical data the [[Cochran–Mantel–Haenszel test]] must be used instead of Fisher's test. |
||
Choi et al.<ref name="Choi2015" /> propose a p-value derived from the likelihood ratio test based on the conditional distribution of the [[odds ratio]] given the marginal success rate. This p-value is inferentially consistent with classical tests of normally distributed data as well as with likelihood ratios and support intervals based on this conditional likelihood function. It is also readily computable.<ref name="Choi2011">{{Cite web |
Choi et al.<ref name="Choi2015" /> propose a ''p''-value derived from the likelihood ratio test based on the conditional distribution of the [[odds ratio]] given the marginal success rate. This ''p''-value is inferentially consistent with classical tests of normally distributed data as well as with likelihood ratios and support intervals based on this conditional likelihood function. It is also readily computable.<ref name="Choi2011">{{Cite web |
||
| last = Choi |
| last = Choi |
||
| first = Leena |
| first = Leena |
||
Line 256: | Line 332: | ||
==References== |
==References== |
||
{{ |
{{reflist}} |
||
==External links== |
==External links== |
||
Line 265: | Line 341: | ||
[[Category:Statistical tests for contingency tables]] |
[[Category:Statistical tests for contingency tables]] |
||
[[Category:Nonparametric statistics]] |
[[Category:Nonparametric statistics]] |
||
[[Category:Ronald Fisher]] |
Latest revision as of 00:23, 9 July 2024
Fisher's exact test is a statistical significance test used in the analysis of contingency tables.[1][2][3] Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, and is one of a class of exact tests, so called because the significance of the deviation from a null hypothesis (e.g., p-value) can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity, as with many statistical tests.
Fisher is said to have devised the test following a comment from Muriel Bristol, who claimed to be able to detect whether the tea or the milk was added first to her cup. He tested her claim in the "lady tasting tea" experiment.[4]
Purpose and scope
[edit]The test is useful for categorical data that result from classifying objects in two different ways; it is used to examine the significance of the association (contingency) between the two kinds of classification. So in Fisher's original example, one criterion of classification could be whether milk or tea was put in the cup first; the other could be whether Bristol thinks that the milk or tea was put in first. We want to know whether these two classifications are associated—that is, whether Bristol really can tell whether milk or tea was poured in first. Most uses of the Fisher test involve, like this example, a 2 × 2 contingency table (discussed below). The p-value from the test is computed as if the margins of the table are fixed, i.e. as if, in the tea-tasting example, Bristol knows the number of cups with each treatment (milk or tea first) and will therefore provide guesses with the correct number in each category. As pointed out by Fisher, this leads under a null hypothesis of independence to a hypergeometric distribution of the numbers in the cells of the table.
With large samples, a chi-squared test (or better yet, a G-test) can be used in this situation. However, the significance value it provides is only an approximation, because the sampling distribution of the test statistic that is calculated is only approximately equal to the theoretical chi-squared distribution. The approximation is poor when sample sizes are small, or the data are very unequally distributed among the cells of the table, resulting in the cell counts predicted on the null hypothesis (the "expected values") being low. The usual rule for deciding whether the chi-squared approximation is good enough is that the chi-squared test is not suitable when the expected values in any of the cells of a contingency table are below 5, or below 10 when there is only one degree of freedom (this rule is now known to be overly conservative[5]). In fact, for small, sparse, or unbalanced data, the exact and asymptotic p-values can be quite different and may lead to opposite conclusions concerning the hypothesis of interest.[6][7] In contrast the Fisher exact test is, as its name states, exact as long as the experimental procedure keeps the row and column totals fixed, and it can therefore be used regardless of the sample characteristics. It becomes difficult to calculate with large samples or well-balanced tables, but fortunately these are exactly the conditions where the chi-squared test is appropriate.
For hand calculations, the test is feasible only in the case of a 2 × 2 contingency table. However the principle of the test can be extended to the general case of an m × n table,[8][9] and some statistical packages provide a calculation (sometimes using a Monte Carlo method to obtain an approximation) for the more general case.[10]
The test can also be used to quantify the overlap between two sets. For example, in enrichment analyses in statistical genetics one set of genes may be annotated for a given phenotype and the user may be interested in testing the overlap of their own set with those. In this case a 2 × 2 contingency table may be generated and Fisher's exact test applied through identifying
- Genes that are provided in both lists
- Genes that are provided in the first list and not the second
- Genes that are provided in the second list and not the first
- Genes that are not provided in either list
The test assumes genes in either list are taken from a broader set of genes (e.g. all remaining genes). A p-value may then be calculated, summarizing the significance of the overlap between the two lists.[11]
Derivation
[edit]Class I | Class II | Row Total | |
---|---|---|---|
Blue | a | b | a + b |
Red | c | d | c + d |
Column Total | a + c | b + d | a + b + c + d (=n) |
We set up the following probability model underlying Fisher’s exact test.
Suppose we have blue balls, and red balls. We throw them together into a black box, shake well, then remove them one by one until we have pulled out exactly balls. We call these balls “class I” and the remaining balls “class II”.
The question is to calculate the probability that exactly blue balls are in class I. Every other entry in the table is fixed once we fill in one entry of the table.
Suppose we pretend that every ball is labelled, and before we start pulling out the balls, we permutate them uniformly randomly, then pull out the first balls. This gives us possibilities.
Of these possibilities, we condition on the case where the first balls contain exactly blue balls. To count these possibilities, we do the following: first select uniformly at random a subset of size among the class-I balls with possibilities, then select uniformly at random a subset of size among the class-II balls with possibilities.
The two selected sets would be filled with blue balls. The rest would be filled with red balls.
Once we have selected the sets, we can populate them with an arbitrary ordering of the blue balls. This gives us possibilities. Same for the red balls, with possibilities.
In full, we have possibilities.
Thus the probability of this event is
Another derivation:
Suppose each blue ball and red ball has an equal and independent probability of being in class I, and of being in class II. Then the number of class-I blue balls is binomially distributed. The probability there are exactly of them is , and the probability there are exactly of red class I balls is .
The probability that there are precisely of class I balls, regardless of number of red or blue balls in it, is .
Thus, conditional on having class I balls, the conditional probability of having a table as shown is
Example
[edit]For example, a sample of teenagers might be divided into male and female on one hand and those who are and are not currently studying for a statistics exam on the other. For example, we hypothesize that the proportion of studying students is higher among the women than among the men, and we want to test whether any difference in proportions that we observe is significant.
The data might look like this:
Men | Women | Row total | |
---|---|---|---|
Studying | 1 | 9 | 10 |
Not-studying | 11 | 3 | 14 |
Column total | 12 | 12 | 24 |
The question we ask about these data is: Knowing that 10 of these 24 teenagers are studying and that 12 of the 24 are female, and assuming the null hypothesis that men and women are equally likely to study, what is the probability that these 10 teenagers who are studying would be so unevenly distributed between the women and the men? If we were to choose 10 of the teenagers at random, what is the probability that 9 or more of them would be among the 12 women and only 1 or fewer from among the 12 men?
First example
[edit]Before we proceed with the Fisher test, we first introduce some notations. We represent the cells by the letters a, b, c and d, call the totals across rows and columns marginal totals, and represent the grand total by n. So the table now looks like this:
Men | Women | Row Total | |
---|---|---|---|
Studying | a | b | a + b |
Non-studying | c | d | c + d |
Column Total | a + c | b + d | a + b + c + d (=n) |
Fisher showed that conditional on the margins of the table, a is distributed as a hypergeometric distribution with a+c draws from a population with a+b successes and c+d failures. The probability of obtaining such set of values is given by:
where is the binomial coefficient and the symbol ! indicates the factorial operator. This can be seen as follows. If the marginal totals (i.e. , , , and ) are known, only a single degree of freedom is left: the value e.g. of suffices to deduce the other values. Now, is the probability that elements are positive in a random selection (without replacement) of elements from a larger set containing elements in total out of which are positive, which is precisely the definition of the hypergeometric distribution.
With the data above (using the first of the equivalent forms), this gives:
Second example
[edit]The formula above gives the exact hypergeometric probability of observing this particular arrangement of the data, assuming the given marginal totals, on the null hypothesis that men and women are equally likely to be studiers. To put it another way, if we assume that the probability that a man is a studier is , the probability that a woman is a studier is also , and we assume that both men and women enter our sample independently of whether or not they are studiers, then this hypergeometric formula gives the conditional probability of observing the values a, b, c, d in the four cells, conditionally on the observed marginals (i.e., assuming the row and column totals shown in the margins of the table are given). This remains true even if men enter our sample with different probabilities than women. The requirement is merely that the two classification characteristics—gender, and studier (or not)—are not associated.
For example, suppose we knew probabilities with such that (male studier, male non-studier, female studier, female non-studier) had respective probabilities for each individual encountered under our sampling procedure. Then still, were we to calculate the distribution of cell entries conditional given marginals, we would obtain the above formula in which neither nor occurs. Thus, we can calculate the exact probability of any arrangement of the 24 teenagers into the four cells of the table, but Fisher showed that to generate a significance level, we need consider only the cases where the marginal totals are the same as in the observed table, and among those, only the cases where the arrangement is as extreme as the observed arrangement, or more so. (Barnard's test relaxes this constraint on one set of the marginal totals.) In the example, there are 11 such cases. Of these only one is more extreme in the same direction as our data; it looks like this:
Men | Women | Row Total | |
---|---|---|---|
Studying | 0 | 10 | 10 |
Non-studying | 12 | 2 | 14 |
Column Total | 12 | 12 | 24 |
For this table (with extremely unequal studying proportions) the probability is .
p-value tests
[edit]In order to calculate the significance of the observed data, i.e. the total probability of observing data as extreme or more extreme if the null hypothesis is true, we have to calculate the values of p for both these tables, and add them together. This gives a one-tailed test, with p approximately 0.001346076 + 0.000033652 = 0.001379728. For example, in the R statistical computing environment, this value can be obtained as fisher.test(rbind(c(1,9),c(11,3)), alternative="less")$p.value
, or in Python, using scipy.stats.fisher_exact(table=[[1,9],[11,3]], alternative="less")
(where one receives both the prior odds ratio and the p-value). This value can be interpreted as the sum of evidence provided by the observed data—or any more extreme table—for the null hypothesis (that there is no difference in the proportions of studiers between men and women). The smaller the value of p, the greater the evidence for rejecting the null hypothesis; so here the evidence is strong that men and women are not equally likely to be studiers.
For a two-tailed test we must also consider tables that are equally extreme, but in the opposite direction. Unfortunately, classification of the tables according to whether or not they are 'as extreme' is problematic. An approach used by the fisher.test
function in R is to compute the p-value by summing the probabilities for all tables with probabilities less than or equal to that of the observed table. In the example here, the 2-sided p-value is twice the 1-sided value—but in general these can differ substantially for tables with small counts, unlike the case with test statistics that have a symmetric sampling distribution.
Controversies
[edit]Fisher's test gives exact p-values, but some authors have argued that it is conservative, i.e. that its actual rejection rate is below the nominal significance level.[13][14][15] The apparent contradiction stems from the combination of a discrete statistic with fixed significance levels.[16][17] Consider the following proposal for a significance test at the 5%-level: reject the null hypothesis for each table to which Fisher's test assigns a p-value equal to or smaller than 5%. Because the set of all tables is discrete, there may not be a table for which equality is achieved. If is the largest p-value smaller than 5% which can actually occur for some table, then the proposed test effectively tests at the -level. For small sample sizes, might be significantly lower than 5%.[13][14][15] While this effect occurs for any discrete statistic (not just in contingency tables, or for Fisher's test), it has been argued that the problem is compounded by the fact that Fisher's test conditions on the marginals.[18] To avoid the problem, many authors discourage the use of fixed significance levels when dealing with discrete problems.[16][17]
The decision to condition on the margins of the table is also controversial.[19][20] The p-values derived from Fisher's test come from the distribution that conditions on the margin totals. In this sense, the test is exact only for the conditional distribution and not the original table where the margin totals may change from experiment to experiment. It is possible to obtain an exact p-value for the 2×2 table when the margins are not held fixed. Barnard's test, for example, allows for random margins. However, some authors[16][17][20] (including, later, Barnard himself)[16] have criticized Barnard's test based on this property. They argue that the marginal success total is an (almost[17]) ancillary statistic, containing (almost) no information about the tested property.
The act of conditioning on the marginal success rate from a 2×2 table can be shown to ignore some information in the data about the unknown odds ratio.[21] The argument that the marginal totals are (almost) ancillary implies that the appropriate likelihood function for making inferences about this odds ratio should be conditioned on the marginal success rate.[21] Whether this lost information is important for inferential purposes is the essence of the controversy.[21]
Alternatives
[edit]An alternative exact test, Barnard's exact test, has been developed and proponents[22] of it suggest that this method is more powerful, particularly in 2×2 tables.[23] Furthermore, Boschloo's test is an exact test that is uniformly more powerful than Fisher's exact test by construction.[24]
Most modern statistical packages will calculate the significance of Fisher tests, in some cases even where the chi-squared approximation would also be acceptable. The actual computations as performed by statistical software packages will as a rule differ from those described above, because numerical difficulties may result from the large values taken by the factorials. A simple, somewhat better computational approach relies on a gamma function or log-gamma function, but methods for accurate computation of hypergeometric and binomial probabilities remains an active research area.
For stratified categorical data the Cochran–Mantel–Haenszel test must be used instead of Fisher's test.
Choi et al.[21] propose a p-value derived from the likelihood ratio test based on the conditional distribution of the odds ratio given the marginal success rate. This p-value is inferentially consistent with classical tests of normally distributed data as well as with likelihood ratios and support intervals based on this conditional likelihood function. It is also readily computable.[25]
See also
[edit]References
[edit]- ^ Fisher, R. A. (1922). "On the interpretation of χ2 from contingency tables, and the calculation of P". Journal of the Royal Statistical Society. 85 (1): 87–94. doi:10.2307/2340521. JSTOR 2340521.
- ^ Fisher, R.A. (1954). Statistical Methods for Research Workers. Oliver and Boyd. ISBN 0-05-002170-2.
- ^ Agresti, Alan (1992). "A Survey of Exact Inference for Contingency Tables". Statistical Science. 7 (1): 131–153. CiteSeerX 10.1.1.296.874. doi:10.1214/ss/1177011454. JSTOR 2246001.
- ^ Fisher, Sir Ronald A. (1956) [The Design of Experiments (1935)]. "Mathematics of a Lady Tasting Tea". In James Roy Newman (ed.). The World of Mathematics, volume 3. Courier Dover Publications. ISBN 978-0-486-41151-4.
- ^ Larntz, Kinley (1978). "Small-sample comparisons of exact levels for chi-squared goodness-of-fit statistics". Journal of the American Statistical Association. 73 (362): 253–263. doi:10.2307/2286650. JSTOR 2286650.
- ^ Mehta, Cyrus R; Patel, Nitin R; Tsiatis, Anastasios A (1984). "Exact significance testing to establish treatment equivalence with ordered categorical data". Biometrics. 40 (3): 819–825. doi:10.2307/2530927. JSTOR 2530927. PMID 6518249.
- ^ Mehta, C. R. 1995. SPSS 6.1 Exact test for Windows. Englewood Cliffs, NJ: Prentice Hall.
- ^ Mehta C.R.; Patel N.R. (1983). "A Network Algorithm for Performing Fisher's Exact Test in r Xc Contingency Tables". Journal of the American Statistical Association. 78 (382): 427–434. doi:10.2307/2288652. JSTOR 2288652.
- ^ mathworld.wolfram.com Page giving the formula for the general form of Fisher's exact test for m × n contingency tables
- ^ Cyrus R. Mehta; Nitin R. Patel (1986). "ALGORITHM 643: FEXACT: a FORTRAN subroutine for Fisher's exact test on unordered r×c contingency tables". ACM Trans. Math. Softw. 12 (2): 154–161. doi:10.1145/6497.214326. S2CID 207666979.
- ^ Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T.; Thomas, Paul D. (2013). "Large-scale gene function analysis with the PANTHER classification system". Nature Protocols. 8 (8): 1551–1566. doi:10.1038/nprot.2013.092. PMC 6519453. PMID 23868073.
- ^ STAT 226: Lecture 7, Section 2.6, Fisher’s Exact Tests. Yibi Huang, University of Chicago
- ^ a b Liddell, Douglas (1976). "Practical tests of 2×2 contingency tables". The Statistician. 25 (4): 295–304. doi:10.2307/2988087. JSTOR 2988087.
- ^ a b Berkson, Joseph (1978). "In dispraise of the exact test". Journal of Statistical Planning and Inference. 2: 27–42. doi:10.1016/0378-3758(78)90019-8.
- ^ a b D'Agostino, R. B.; Chase, W. & Belanger, A. (1988). "The appropriateness of some common procedures for testing equality of two independent binomial proportions". The American Statistician. 42 (3): 198–202. doi:10.2307/2685002. JSTOR 2685002.
- ^ a b c d Yates, F. (1984). "Tests of significance for 2 × 2 contingency tables (with discussion)". Journal of the Royal Statistical Society, Series A. 147 (3): 426–463. doi:10.2307/2981577. JSTOR 2981577. S2CID 15760519.
- ^ a b c d Little, Roderick J. A. (1989). "Testing the equality of two independent binomial proportions". The American Statistician. 43 (4): 283–288. doi:10.2307/2685390. JSTOR 2685390.
- ^ Mehta, Cyrus R.; Senchaudhuri, Pralay (4 September 2003). "Conditional versus unconditional exact tests for comparing two binomials" (PDF). Retrieved 20 November 2009.
- ^ Barnard, G.A. (1945). "A new test for 2×2 tables". Nature. 156 (3954): 177. Bibcode:1945Natur.156..177B. doi:10.1038/156177a0.
- ^ a b Fisher (1945). "A New Test for 2 × 2 Tables". Nature. 156 (3961): 388. Bibcode:1945Natur.156..388F. doi:10.1038/156388a0. S2CID 4113420.; Barnard, G.A. (1945). "A new test for 2×2 tables". Nature. 156 (3974): 783–784. Bibcode:1945Natur.156..783B. doi:10.1038/156783b0. S2CID 4099311.
- ^ a b c d Choi L, Blume JD, Dupont WD (2015). "Elucidating the foundations of statistical inference with 2×2 tables". PLOS ONE. 10 (4): e0121263. Bibcode:2015PLoSO..1021263C. doi:10.1371/journal.pone.0121263. PMC 4388855. PMID 25849515.
- ^ Lydersen, S., Fagerland, M. W., and Laake, P. (2009). "Recommended tests for association in 2× 2 tables". Statistics in Medicine. 28 (7): 1159–1175. doi:10.1002/sim.3531. PMID 19170020. S2CID 3900997.
{{cite journal}}
: CS1 maint: multiple names: authors list (link) - ^ Berger R.L. (1994). "Power comparison of exact unconditional tests for comparing two binomial proportions". Institute of Statistics Mimeo Series No. 2266: 1–19.
- ^ Boschloo R.D. (1970). "Raised Conditional Level of Significance for the 2x2-table when Testing the Equality of Two Probabilities". Statistica Neerlandica. 24: 1–35. doi:10.1111/j.1467-9574.1970.tb00104.x.
- ^ Choi, Leena (2011). "ProfileLikelihood: profile likelihood for a parameter in commonly used statistical models; 2011. R package version 1.1". See also: Likelihood Ratio Statistics for 2 x 2 Tables Archived 4 June 2016 at the Wayback Machine (Online calculator).