Boschloo's test: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 05:02, 22 March 2024

Boschloo's test is a statistical hypothesis test for analysing 2x2 contingency tables. It examines the association of two Bernoulli distributed random variables and is a uniformly more powerful alternative to Fisher's exact test. It was proposed in 1970 by R. D. Boschloo.^[1]

Setting

A 2 × 2 contingency table visualizes $\ n\$ independent observations of two binary variables $\ A\$ and $\ B\$ :

{\begin{array}{c|cc|c}&B=1&B=0&{\mbox{Total}}\\\hline A=1&x_{11}&x_{10}&n_{1}\\A=0&x_{01}&x_{00}&n_{0}\\\hline {\mbox{Total}}&s_{1}&s_{0}&n\\\end{array}}

The probability distribution of such tables can be classified into three distinct cases.^[2]

The row sums $\ n_{1}\ ,n_{0}\$ and column sums $\ s_{1}\ ,s_{0}\$ are fixed in advance and not random.
Then all $\ x_{ij}\$ are determined by $\ x_{11}~.$ If $\ A\$ and $\ B\$ are independent, $\ x_{11}\$ follows a hypergeometric distribution with parameters $\ n\ ,n_{1}\ ,s_{1}\ :$
$\ x_{11}\ \sim \ {\mbox{Hypergeometric}}(\ n\ ,n_{1}\ ,s_{1}\ )~.$
The row sums $\ n_{1}\ ,n_{0}\$ are fixed in advance but the column sums $\ s_{1}\ ,s_{0}\$ are not.
Then all random parameters are determined by $\ x_{11}\$ and $x_{01}\$ and $\ x_{11}\ ,x_{01}\$ follow a binomial distribution with probabilities $\ p_{1}\ ,p_{0}\ :$
$\ x_{11}\ \sim \ B(\ n_{1}\ ,p_{1}\ )\$
$\ x_{01}\ \sim \ B(\ n_{0}\ ,p_{0}\ )\$
Only the total number $\ n\$ is fixed but the row sums $\ n_{1}\ ,n_{0}\$ and the column sums $\ s_{1}\ ,s_{0}\$ are not.
Then the random vector $\ (\ x_{11},x_{10}\ ,x_{01}\ ,x_{00}\ )\$ follows a multinomial distribution with probability vector $\ (p_{11}\ ,p_{10}\ ,p_{01}\ ,p_{00}\ )~.$

Experiment type 1: Rare taste-test experiment, fully constrained

Fisher's exact test is designed for the first case and therefore an exact conditional test (because it conditions on the column sums). The typical example of such a case is the Lady tasting tea: A lady tastes 8 cups of tea with milk. In 4 of those cups the milk is poured in before the tea. In the other 4 cups the tea is poured in first.

The lady tries to assign the cups to the two categories. Following our notation, the random variable $\ A\$ represents the used method (1 = milk first, 0 = milk last) and $\ B\$ represents the lady's guesses (1 = milk first guessed, 0 = milk last guessed). Then the row sums are the fixed numbers of cups prepared with each method: $\ n_{1}=4\ ,n_{0}=4~.$ The lady knows that there are 4 cups in each category, so will assign 4 cups to each method. Thus, the column sums are also fixed in advance: $\ s_{1}=4\ ,s_{0}=4~.$ If she is not able to tell the difference, $\ A\$ and $\ B\$ are independent and the number $\ x_{11}\$ of correctly classified cups with milk first follows the hypergeometric distribution $\ {\mbox{Hypergeometric}}(8,4,4)~.$

Experiment type 2: Normal laboratory controlled experiment, only one margin constrained

Boschloo's test is designed for the second case and therefore an exact unconditional test. Examples of such a case are often found in medical research, where a binary endpoint is compared between two patient groups. Following our notation, $\ A=1\$ represents the first group that receives some medication of interest. $\ A=0\$ represents the second group that receives a placebo. $B$ indicates the cure of a patient (1 = cure, 0 = no cure). Then the row sums equal the group sizes and are usually fixed in advance. The column sums are the total number of cures respectively disease continuations and not fixed in advance.

Experiment type 3: Field observation, no marginal constraints at all

Pearson's chi-squared test (without any "continuity correction") is the correct choice for the third case, where there are no constraints on either the row totals or the column totals. This third scenario describes most observational studies or "field-observations", where data is collected as-available in an uncontrolled environment. For example, if one goes out collecting two types of butterflies of some particular predetermined identifiable color, which can be recognized before capture, however it is not possible to distinguished whether a butterfly is species 1 or species 0; before it is captured and closely examined: One can merely tell by its color that a butterfly being pursued must be either one of the two species of interest. For any one day's session of butterfly collecting, one cannot predetermine how many of each species will be collected, only perhaps the total number of capture, depending on the collector's criterion for stopping. If the species are tallied in separate rows of the table, then the row sums are unconstrained and independently binomially distributed. The second distinction between the captured butterflies will be whether the butterfly is female (type 1) or male (type 0), tallied in the columns. If its sex also requires close examination of the butterfly, that also is independently binomially random. That means that because of the experimental design, the column sums are unconstrained just like the rows are: Neither the count for either of species, nor count of the sex of the captured butterflies in each species is predetermined by the process of observation, and neither total constrains the other.

The only possible constraint is the grand total of all butterflies captured, and even that could itself be unconstrained, depending on how the collector decides to stop. But since one cannot reliably know beforehand for any one particular day in any one particular meadow how successful one's pursuit might be during the time available for collection, even the grand total might be unconstrained: It depends on whether the constraint on data collected is the time available to catch butterflies, or some predetermined total to be collected, perhaps to ensure adequately significant statistics.

This type of 'experiment' (also called a "field observation") is almost entirely uncontrolled, hence some prefer to only call it an 'observation', not an 'experiment'. All the numbers in the table are independently random. Each of the cells of the contingency table is a separate binomial probability and neither Fisher's fully constrained 'exact' test nor Boschloo's partly-constrained test are based on the statistics arising from the experimental design. Pearson's chi-squared test is the appropriate test for an unconstrained observational study, and Pearson's test, in turn, employs the wrong statistical model for the other two types of experiment. (Note in passing that Pearson's chi-squared statistic should never have any "continuity correction" applied, what-so-ever, e.g. no "Yates' correction": The consequence of that "correction" will be to distort its $p$ values to match Fisher's test, i.e. give the wrong answer.)

Test hypothesis

The null hypothesis of Boschloo's one-tailed test (high values of $x_{1}$ favor the alternative hypothesis) is:

H_{0}:p_{1}\leq p_{0}

The null hypothesis of the one-tailed test can also be formulated in the other direction (small values of $x_{1}$ favor the alternative hypothesis):

H_{0}:p_{1}\geq p_{0}

The null hypothesis of the two-tailed test is:

H_{0}:p_{1}=p_{0}

There is no universal definition of the two-tailed version of Fisher's exact test.^[3] Since Boschloo's test is based on Fisher's exact test, a universal two-tailed version of Boschloo's test also doesn't exist. In the following we deal with the one-tailed test and $H_{0}:p_{1}\leq p_{0}$ .

Boschloo's idea

We denote the desired significance level by $\alpha$ . Fisher's exact test is a conditional test and appropriate for the first of the above mentioned cases. But if we treat the observed column sum $s_{1}$ as fixed in advance, Fisher's exact test can also be applied to the second case. The true size of the test then depends on the nuisance parameters $p_{1}$ and $p_{0}$ . It can be shown that the size maximum $\max \limits _{p_{1}\leq p_{0}}{\big (}{\mbox{size}}(p_{1},p_{0}){\big )}$ is taken for equal proportions $p=p_{1}=p_{0}$ ^[4] and is still controlled by $\alpha$ .^[1] However, Boschloo stated that for small sample sizes, the maximal size is often considerably smaller than $\alpha$ . This leads to an undesirable loss of power.

Boschloo proposed to use Fisher's exact test with a greater nominal level $\alpha ^{*}>\alpha$ . Here, $\alpha ^{*}$ should be chosen as large as possible such that the maximal size is still controlled by $\alpha$ : $\max \limits _{p\in [0,1]}{\big (}{\mbox{size}}(p){\big )}\leq \alpha$ . This method was especially advantageous at the time of Boschloo's publication because $\alpha ^{*}$ could be looked up for common values of $\alpha ,n_{1}$ and $n_{0}$ . This made performing Boschloo's test computationally easy.

Test statistic

The decision rule of Boschloo's approach is based on Fisher's exact test. An equivalent way of formulating the test is to use the p-value of Fisher's exact test as test statistic. Fisher's p-value is calculated from the hypergeometric distribution (for ease of notation we write $x_{1},x_{0}$ instead of $x_{11},x_{01}$ ):

p_{F}=1-F_{{\mbox{Hypergeometric}}(n,n_{1},x_{1}+x_{0})}(x_{1}-1)

The distribution of $p_{F}$ is determined by the binomial distributions of $x_{1}$ and $x_{0}$ and depends on the unknown nuisance parameter $p$ . For a specified significance level $\alpha ,$ the critical value of $p_{F}$ is the maximal value $\alpha ^{*}$ that satisfies $\max \limits _{p\in [0,1]}P(p_{F}\leq \alpha ^{*})\leq \alpha$ . The critical value $\alpha ^{*}$ is equal to the nominal level of Boschloo's original approach.

Modification

Boschloo's test deals with the unknown nuisance parameter $p$ by taking the maximum over the whole parameter space $[0,1]$ . The Berger & Boos procedure takes a different approach by maximizing $P(p_{F}\leq \alpha ^{*})$ over a $(1-\gamma )$ confidence interval of $p=p_{1}=p_{0}$ and adding $\gamma$ .^[5] $\gamma$ is usually a small value such as 0.001 or 0.0001. This results in a modified Boschloo's test which is also exact.^[6]

Comparison to other exact tests

All exact tests hold the specified significance level but can have varying power in different situations. Mehrotra et al. compared the power of some exact tests in different situations.^[6] The results regarding Boschloo's test are summarized in the following.

Modified Boschloo's test

Boschloo's test and the modified Boschloo's test have similar power in all considered scenarios. Boschloo's test has slightly more power in some cases, and vice versa in some other cases.

Fisher's exact test

Boschloo's test is by construction uniformly more powerful than Fisher's exact test. For small sample sizes (e.g. 10 per group) the power difference is large, ranging from 16 to 20 percentage points in the regarded cases. The power difference is smaller for greater sample sizes.

Exact Z-Pooled test

This test is based on the test statistic

Z_{P}(x_{1},x_{0})={\frac {{\hat {p}}_{1}-{\hat {p}}_{0}}{\sqrt {{\tilde {p}}(1-{\tilde {p}})({\frac {1}{n_{1}}}+{\frac {1}{n_{0}}})}}},

where ${\hat {p}}_{i}={\frac {x_{i}}{n_{i}}}$ are the group event rates and ${\tilde {p}}={\frac {x_{1}+x_{0}}{n_{1}+n_{0}}}$ is the pooled event rate.

The power of this test is similar to that of Boschloo's test in most scenarios. In some cases, the $Z$ -Pooled test has greater power, with differences mostly ranging from 1 to 5 percentage points. In very few cases, the difference goes up to 9 percentage points.

This test can also be modified by the Berger & Boos procedure. However, the resulting test has very similar power to the unmodified test in all scenarios.

Exact Z-Unpooled test

This test is based on the test statistic

Z_{U}(x_{1},x_{0})={\frac {{\hat {p}}_{1}-{\hat {p}}_{0}}{\sqrt {{\frac {{\hat {p}}_{1}(1-{\hat {p}}_{1})}{n_{1}}}+{\frac {{\hat {p}}_{0}(1-{\hat {p}}_{0})}{n_{0}}}}}},

where ${\hat {p}}_{i}={\frac {x_{i}}{n_{i}}}$ are the group event rates.

The power of this test is similar to that of Boschloo's test in many scenarios. In some cases, the $Z$ -Unpooled test has greater power, with differences ranging from 1 to 5 percentage points. However, in some other cases, Boschloo's test has noticeably greater power, with differences up to 68 percentage points.

This test can also be modified by the Berger & Boos procedure. The resulting test has similar power to the unmodified test in most scenarios. In some cases, the power is considerably improved by the modification but the overall power comparison to Boschloo's test remains unchanged.

Software

The calculation of Boschloo's test can be performed in following software:

The function scipy.stats.boschloo_exact from SciPy
Packages Exact and exact2x2 of the programming language R
StatXact

References

^ ^a ^b Boschloo R.D. (1970). "Raised Conditional Level of Significance for the 2x2-table when Testing the Equality of Two Probabilities". Statistica Neerlandica. 24: 1–35. doi:10.1111/j.1467-9574.1970.tb00104.x.
^ Lydersen, S.; Fagerland, M.W.; Laake, P. (2009). "Recommended tests for association in 2 × 2 tables". Statist. Med. 28 (7): 1159–1175. doi:10.1002/sim.3531. PMID 19170020. S2CID 3900997.
^ Martín Andrés, A, and I. Herranz Tejedor (1995). "Is Fisher's exact test very conservative?". Computational Statistics and Data Analysis. 19 (5): 579–591. doi:10.1016/0167-9473(94)00013-9.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Finner, H, and Strassburger, K (2002). "Structural properties of UMPU-tests for 2x2 tables and some applications". Journal of Statistical Planning and Inference. 104: 103–120. doi:10.1016/S0378-3758(01)00122-7.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Berger, R L, and Boos, D D (1994). "P Values Maximized Over a Confidence Set for the Nuisance Parameter". Journal of the American Statistical Association. 89 (427): 1012–1016. doi:10.2307/2290928. JSTOR 2290928.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ ^a ^b Mehrotra, D V, Chan, I S F, and Berger, R L (2003). "A cautionary note on exact unconditional inference for a difference between two independent binomial proportions". Biometrics. 59 (2): 441–450. doi:10.1111/1541-0420.00051. PMID 12926729. S2CID 28556526.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Boschloo-1] Boschloo R.D. (1970). "Raised Conditional Level of Significance for the 2x2-table when Testing the Equality of Two Probabilities". Statistica Neerlandica. 24: 1–35. doi:10.1111/j.1467-9574.1970.tb00104.x.

[Lydersen-2] Lydersen, S.; Fagerland, M.W.; Laake, P. (2009). "Recommended tests for association in 2 × 2 tables". Statist. Med. 28 (7): 1159–1175. doi:10.1002/sim.3531. PMID 19170020. S2CID 3900997.

[MartinAndres-3] Martín Andrés, A, and I. Herranz Tejedor (1995). "Is Fisher's exact test very conservative?". Computational Statistics and Data Analysis. 19 (5): 579–591. doi:10.1016/0167-9473(94)00013-9.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Finner-4] Finner, H, and Strassburger, K (2002). "Structural properties of UMPU-tests for 2x2 tables and some applications". Journal of Statistical Planning and Inference. 104: 103–120. doi:10.1016/S0378-3758(01)00122-7.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[BergerBoos-5] Berger, R L, and Boos, D D (1994). "P Values Maximized Over a Confidence Set for the Nuisance Parameter". Journal of the American Statistical Association. 89 (427): 1012–1016. doi:10.2307/2290928. JSTOR 2290928.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Mehrotra-6] Mehrotra, D V, Chan, I S F, and Berger, R L (2003). "A cautionary note on exact unconditional inference for a difference between two independent binomial proportions". Biometrics. 59 (2): 441–450. doi:10.1111/1541-0420.00051. PMID 12926729. S2CID 28556526.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 1: / Line 1: @@
+{{short description|Statistical test for analysis of contingency tables}}
-{{Userspace draft|source=ArticleWizard|date=January 2020}}
+{{Overly detailed|date=May 2020}}
-Boschloo's test is a [[Statistical_hypothesis_testing|statistical hypothesis test]] for analysing 2x2 [[Contingency_table|contingency tables]]. It examines the association of two [[Bernoulli_distribution|Bernoulli distributed]] [[Random_variable|random variables]] and is a uniformly more [[Power_(statistics)|powerful]] alternative than [[Fisher's exact test]]. It was published in 1970 by R. D. Boschloo.<ref name="Boschloo">{{cite journal | author = Boschloo R.D. | year = 1970 | title = Raised Conditional Level of Significance for the ''2''x''2''-table when Testing the Equality of Two Probabilities | url = | journal = Statistica Neerlandica | volume = 24 | issue = | pages = 1–35 | doi = 10.1111/j.1467-9574.1970.tb00104.x}}</ref>
+'''Boschloo's test''' is a [[Statistical_hypothesis_testing|statistical hypothesis test]] for analysing 2x2 [[Contingency_table|contingency tables]]. It examines the association of two [[Bernoulli_distribution|Bernoulli distributed]] [[Random_variable|random variables]] and is a uniformly more [[Power_(statistics)|powerful]] alternative to [[Fisher's exact test]]. It was proposed in 1970 by R. D. Boschloo.<ref name="Boschloo">{{cite journal | author = Boschloo R.D. | year = 1970 | title = Raised Conditional Level of Significance for the ''2''x''2''-table when Testing the Equality of Two Probabilities | journal = Statistica Neerlandica | volume = 24 | pages = 1–35 | doi = 10.1111/j.1467-9574.1970.tb00104.x}}</ref>
 == Setting ==
-A 2x2 contingency table visualizes <math>n</math> independent observations of two binary variables <math>A</math> and <math>B</math>:
+A 2 × 2 contingency table visualizes <math>\ n\ </math> independent observations of two binary variables <math>\ A\ </math> and <math>\ B\ </math>:
 :<math>
@@ Line 17: / Line 17: @@
 </math>
-The probability distribution of such tables can be classified in three distinct cases.<ref name="Lydersen">{{cite journal | author = Lydersen, S., Fagerland, M.W. and Laake, P. | year = 2009| title = Recommended tests for association in 2×2 tables | url = | journal = Statist. Med. | volume = 28 | issue = | pages = 1159-1175 | doi = 10.1002/sim.3531}}</ref>
+The probability distribution of such tables can be classified into three distinct cases.<ref name=Lydersen>{{cite journal
+ | last1 = Lydersen  | first1 = S.
+ | last2 = Fagerland | first2 = M.W.
+ | last3 = Laake     | first3 = P.
+ | year = 2009
+ | title = Recommended tests for association in {{nobr|2 × 2 tables}}
+ | journal = Statist. Med.
+ | volume = 28 | issue = 7 | pages = 1159–1175
+ | doi = 10.1002/sim.3531  | pmid = 19170020   | s2cid = 3900997
+}}
-# '''The row sums <math>n_1, n_0</math> and column sums <math>s_1, s_0</math> are fixed in advance and not random.''' <br /> Then all <math>x_{ij}</math> are determined by <math>x_{11}</math>. If <math>A</math> and <math>B</math> are independent, <math>x_{11}</math> follows a [[hypergeometric distribution]] with parameters <math>n, n_1, s_1</math>: <br /> <math>x_{11} \sim \mbox{Hypergeometric}(n, n_1, s_1)</math>.
+</ref>
-# '''The row sums <math>n_1, n_0</math> are fixed in advance but the column sums <math>s_1, s_0</math> are not.'''<br /> Then all random parameters are determined by <math>x_{11}</math> and <math>x_{01}</math> and <math>x_{11}, x_{01}</math> follow a [[binomial distribution]] with probabilities <math>p_1, p_0</math>:<br /> <math>x_{11} \sim B(n_1, p_1)</math><br/> <math>x_{01} \sim B(n_0, p_0)</math>
-# '''Only the total number <math>n</math> is fixed but the row sums <math>n_1, n_0</math> and the column sums <math>s_1, s_0</math> are not.'''<br/> Then the random vector <math>(x_{11}, x_{10}, x_{01}, x_{00})</math> follows a [[Multinomial distribution|multinomial distribution]] with probability vector <math>(p_{11}, p_{10}, p_{01}, p_{00})</math>.
+# '''The row sums <math>\ n_1\ , n_0\ </math> and column sums <math>\ s_1\ , s_0\ </math> are fixed in advance and not random.''' <br/> Then all <math>\ x_{ij}\ </math> are determined by <math>\ x_{11} ~.</math> If <math>\ A\ </math> and <math>\ B\ </math> are independent, <math>\ x_{11}\ </math> follows a [[hypergeometric distribution]] with parameters <math>\ n\ , n_1\ , s_1\ :</math> <br/> <math>\ x_{11}\  \sim\  \mbox{Hypergeometric}(\ n\ , n_1\ , s_1\ ) ~.</math>
+# '''The row sums <math>\ n_1\ , n_0\ </math> are fixed in advance but the column sums <math>\ s_1\ , s_0\ </math> are not.'''<br/> Then all random parameters are determined by <math>\ x_{11}\ </math> and <math>x_{01}\ </math> and <math>\ x_{11}\ , x_{01}\ </math> follow a [[binomial distribution]] with probabilities <math>\ p_1\ , p_0\ :</math><br/> <math>\ x_{11}\  \sim\  B(\ n_1\ , p_1\ )\ </math><br/> <math>\ x_{01}\  \sim\  B(\ n_0\ , p_0\ )\ </math>
+# '''Only the total number <math>\ n\ </math> is fixed but the row sums <math>\ n_1\ , n_0\ </math> and the column sums <math>\ s_1\ , s_0\ </math> are not.'''<br/> Then the random vector <math>\ (\ x_{11}, x_{10}\ , x_{01}\ , x_{00}\ )\ </math> follows a [[multinomial distribution]] with probability vector <math>\ (p_{11}\ , p_{10}\ , p_{01}\ , p_{00}\ ) ~.</math>
+=== Experiment type 1: Rare taste-test experiment, fully constrained ===
-[[Fisher's exact test]] is designed for the first case and therefore an [[Exact test|exact]] conditional test (because it conditions on the column sums). The classical example of such a case is the [[Lady tasting tea]]: A lady tastes 8 cups of tea with milk. In 4 of those cups the milk is poured in before the tea. In the other 4 cups the tea is poured in first. The lady tries to assign the cups to the two categories. Following our notation, the random variable <math>A</math> represents the used method (1 = milk first, 0 = milk last) and <math>B</math> represents the lady's guesses (1 = milk first guessed, 0 = milk last guessed). Then the row sums are the fixed numbers of cups prepared with each method: <math>n_1 = 4, n_0 = 4</math>. The lady will assign 4 cups to each method, so also the column sums are fixed in advance: <math>s_1 = 4, s_0 = 4</math>. If she is not able to tell the difference, <math>A</math> and <math>B</math> are independent and the number <math>x_{11}</math> of correctly classified cups with milk first follows the hypergeometric distribution <math>\mbox{Hypergeometric}(8, 4, 4)</math>.
+[[Fisher's exact test]] is designed for the first case and therefore an [[Exact test|exact]] conditional test (because it conditions on the column sums). The typical example of such a case is the [[Lady tasting tea]]: A lady tastes 8&nbsp;cups of tea with milk. In {{nobr|4 of those}} cups the milk is poured in before the tea. In the other 4&nbsp;cups the tea is poured in first.
+The lady tries to assign the cups to the two categories. Following our notation, the random variable <math>\ A\ </math> represents the used method (1 = milk first, 0 = milk last) and <math>\ B\ </math> represents the lady's guesses (1 = milk first guessed, 0 = milk last guessed). Then the row sums are the fixed numbers of cups prepared with each method: <math>\ n_1 = 4\ , n_0 = 4 ~.</math> The lady knows that there are 4&nbsp;cups in each category, so will assign 4&nbsp;cups to each method. Thus, the column sums are also fixed in advance: <math>\ s_1 = 4\ , s_0 = 4 ~.</math> If she is not able to tell the difference, <math>\ A\ </math> and <math>\ B\ </math> are independent and the number <math>\ x_{11}\ </math> of correctly classified cups with milk first follows the hypergeometric distribution <math>\ \mbox{Hypergeometric}(8, 4, 4) ~.</math>
-Boschloo's test is designed for the second case and therefore an exact unconditional test. Examples of such a case are often found in medical research, where a binary [[Clinical endpoint|endpoint]] is compared between two patient groups. Following our notation, <math>A = 1</math> represents the first group that receives some medication of interest. <math>A = 0</math> represents the second group that receives a [[placebo]]. <math>B</math> indicates the cure of a patient (1 = cure, 0 = no cure). Then the row sums equal the group sizes and usually are fixed in advance. The column sums are the total number of cures respectively disease continuations and not fixed in advance.
+=== Experiment type 2: Normal laboratory controlled experiment, only one margin constrained ===
-An example for the third case can be constructed as follows: Simultaneously flip two distinguishable coins <math>A</math> and <math>B</math> and do this <math>n</math> times. If we count the number of results in our 2x2 table (1 = head, 0 = tail), we neither know in advance how often coin <math>A</math> shows head or tail (row sums random), nor do we know how often coin <math>B</math> shows head or tail (column sums random).
+Boschloo's test is designed for the second case and therefore an exact unconditional test. Examples of such a case are often found in medical research, where a binary [[Clinical endpoint|endpoint]] is compared between two patient groups. Following our notation, <math>\ A = 1\ </math> represents the first group that receives some medication of interest. <math>\ A = 0\ </math> represents the second group that receives a [[placebo]]. <math>B</math> indicates the cure of a patient (1 = cure, 0 = no cure). Then the row sums equal the group sizes and are usually fixed in advance. The column sums are the total number of cures respectively disease continuations and not fixed in advance.
+=== Experiment type 3: Field observation, no marginal constraints at all ===
+[[Pearson's chi-squared test]] (without ''any'' "continuity correction") is the correct choice for the third case, where there are no constraints on either the row totals or the column totals. This third scenario describes most observational studies or "field-observations", where data is collected as-available in an uncontrolled environment. For example, if one goes out collecting two types of butterflies of some particular predetermined identifiable color, which can be recognized before capture, however it is ''not'' possible to distinguished whether a butterfly is species&nbsp;1 or species&nbsp;0; before it is captured and closely examined: One can merely tell by its color that a butterfly being pursued must be either one of the two species of interest. For any one day's session of butterfly collecting, one cannot predetermine how many of each species will be collected, only perhaps the total number of capture, depending on the collector's criterion for stopping. If the species are tallied in separate rows of the table, then the row sums are unconstrained and independently binomially distributed. The second distinction between the captured butterflies will be whether the butterfly is female (type&nbsp;1) or male (type&nbsp;0), tallied in the columns. If its sex also requires close examination of the butterfly, that also is independently binomially random. That means that because of the [[experimental design]], the column sums are unconstrained just like the rows are: Neither the count for either of species, nor count of the sex of the captured butterflies in each species is predetermined by the process of observation, and neither total constrains the other.
+The only possible constraint is the grand total of all butterflies captured, and even that could itself be unconstrained, depending on how the collector decides to stop. But since one cannot reliably know beforehand for any one particular day in any one particular meadow how successful one's pursuit might be during the time available for collection, even the grand total might be unconstrained: It depends on whether the constraint on data collected is the time available to catch butterflies, or some predetermined total to be collected, perhaps to ensure adequately significant statistics.
+This type of 'experiment' (also called a "field observation") is almost entirely uncontrolled, hence some prefer to only call it an 'observation', not an 'experiment'. All the numbers in the table are independently random. Each of the cells of the contingency table is a separate binomial probability and neither Fisher's fully constrained 'exact' test nor Boschloo's partly-constrained test are based on the statistics arising from the experimental design. [[Pearson's chi-squared test]] is the appropriate test for an unconstrained observational study, and Pearson's test, in turn, employs the wrong statistical model for the other two types of experiment. (Note in passing that Pearson's chi-squared statistic should ''never'' have ''any'' "continuity correction" applied, what-so-ever, e.g. no "Yates' correction": The consequence of that "correction" will be to distort its {{nobr|{{mvar|p}} values}} to match Fisher's test, i.e. give the wrong answer.)
+<!--
+Another example for the third case can be constructed as follows: Simultaneously flip two distinguishable coins <math>\ A\ </math> and <math>\ B\ </math> and do this <math>\ n\ </math> times. If we count the number of results in our {{nobr|2×2 table}} (1 = head, 0 = tail), we neither know in advance how often coin <math>\ A\ </math> shows head or tail (row sums random), nor do we know how often coin <math>\ B\ </math> shows head or tail (column sums random).
+-->
 == Test hypothesis ==
-The [[Null hypothesis|null hypothesis]] of Boschloo's [[One- and two-tailed tests|one-tailed test]] is (high values of <math>x_1</math> favor the alternative hypothesis):
+The [[null hypothesis]] of Boschloo's [[One- and two-tailed tests|one-tailed test]] (high values of <math>x_1</math> favor the alternative hypothesis) is:
 :<math>
 H_0: p_1 \le p_0
@@ Line 44: / Line 66: @@
 </math>
-There is no unique definition of the two-tailed version of Fisher's exact test.<ref name="MartinAndres">{{cite journal | author = Martín Andrés, A, and I. Herranz Tejedor | year = 1995| title = Is Fisher's exact test very conservative? | url = | journal = Computational Statistics and Data Analysis | volume = 19 | issue = | pages = 579–591 | doi = 10.1016/0167-9473(94)00013-9}}</ref> This is also true for Boschloo's test as it is based on Fisher's exact test. In the following we deal with the one-tailed test and <math>H_0: p_1 \le p_0</math>.
+There is no universal definition of the two-tailed version of Fisher's exact test.<ref name="MartinAndres">{{cite journal | author = Martín Andrés, A, and I. Herranz Tejedor | year = 1995| title = Is Fisher's exact test very conservative? | journal = Computational Statistics and Data Analysis | volume = 19 | issue = 5| pages = 579–591 | doi = 10.1016/0167-9473(94)00013-9}}</ref> Since Boschloo's test is based on Fisher's exact test, a universal two-tailed version of Boschloo's test also doesn't exist. In the following we deal with the one-tailed test and <math>H_0: p_1 \le p_0</math>.
-== Boschloo's Idea ==
+== Boschloo's idea ==
-We denote the desired [[significance level]] by <math>\alpha</math>. Fisher's exact test is a conditional test and appropriate for the first above mentioned case. But if one pretends that the observed column sum <math>s_1</math> would have been fixed in advance, Fisher's exact test can also be applied to the second case. The true [[Size (statistics)|size]] of the test then depends on the [[nuisance parameter|nuisance parameters]] <math>p_1, p_0</math>. It can be shown that the size maximum <math>\max\limits_{p_1 \le p_0}\big(\mbox{size}(p_1, p_0)\big)</math> is taken for equal proportions <math>p=p_1=p_0</math><ref name="Finner">{{cite journal | author = Finner, H, and Strassburger, K | year = 2002| title = Structural properties of UMPU-tests for 2x2 tables and some applications | url = | journal = Journal of Statistical Planning and Inference | volume = 104 | issue = | pages = 103-120 | doi = 10.1016/S0378-3758(01)00122-7}}</ref> and is still controlled by <math>\alpha</math><ref name="Boschloo"/>. However, Boschloo stated that for small sample sizes, the maximal size is often considerably smaller than <math>\alpha</math>. This leads to an undesirable loss of [[Power (statistics)|power]].
+We denote the desired [[significance level]] by <math>\alpha</math>. Fisher's exact test is a conditional test and appropriate for the first of the above mentioned cases. But if we treat the observed column sum <math>s_1</math> as fixed in advance, Fisher's exact test can also be applied to the second case. The true [[Size (statistics)|size]] of the test then depends on the [[nuisance parameter|nuisance parameters]] <math>p_1</math> and <math>p_0</math>. It can be shown that the size maximum <math>\max\limits_{p_1 \le p_0}\big(\mbox{size}(p_1, p_0)\big)</math> is taken for equal proportions <math>p=p_1=p_0</math><ref name="Finner">{{cite journal | author = Finner, H, and Strassburger, K | year = 2002| title = Structural properties of UMPU-tests for 2x2 tables and some applications | journal = Journal of Statistical Planning and Inference | volume = 104 | pages = 103–120 | doi = 10.1016/S0378-3758(01)00122-7}}</ref> and is still controlled by <math>\alpha</math>.<ref name="Boschloo"/> However, Boschloo stated that for small sample sizes, the maximal size is often considerably smaller than <math>\alpha</math>. This leads to an undesirable loss of [[Power (statistics)|power]].
-Boschloo's Idea was to use Fisher's exact test with a greater nominal level <math>\alpha^* > \alpha</math>. Thereby, <math>\alpha^*</math> should be chosen maximal such that the maximal size is still controlled by <math>\alpha</math>: <math>\max\limits_{p \in [0, 1]}\big(\mbox{size}(p)\big) \le \alpha</math>. This method was especially advantageous at that time because <math>\alpha^*</math> could be looked up for common values of <math>\alpha, n_1</math> and <math>n_0</math>. This made performing Boschloo's test computationally easy.
+Boschloo proposed to use Fisher's exact test with a greater nominal level <math>\alpha^* > \alpha</math>. Here, <math>\alpha^*</math> should be chosen as large as possible such that the maximal size is still controlled by <math>\alpha</math>: <math>\max\limits_{p \in [0, 1]}\big(\mbox{size}(p)\big) \le \alpha</math>. This method was especially advantageous at the time of Boschloo's publication because <math>\alpha^*</math> could be looked up for common values of <math>\alpha, n_1</math> and <math>n_0</math>. This made performing Boschloo's test computationally easy.
 == Test statistic ==
-The [[decision rule]] of Boschloo's approach is based on Fisher's exact test. An equivalent way of formulating the test is to use the p-value of Fisher's exact test as [[test statistic]]. Fisher's p-value is calculated from the hypergeometric [[Cumulative distribution function|distribution]]:
+The [[decision rule]] of Boschloo's approach is based on Fisher's exact test. An equivalent way of formulating the test is to use the p-value of Fisher's exact test as [[test statistic]]. Fisher's p-value is calculated from the hypergeometric distribution (for ease of notation we write <math>x_1, x_0</math> instead of <math>x_{11}, x_{01}</math>):
 :<math>
 p_F = 1-F_{\mbox{Hypergeometric}(n, n_1, x_1+x_0)}(x_1-1)
 </math>
-The distribution of <math>p_F</math> is determined by the binomial distributions of <math>x_1</math> and <math>x_0</math> and depends on the unknown nuisance parameters <math>p_1, p_0</math>. For a specified significance level <math>\alpha</math>, the [[Critical_value#Statistics|critical value]] of <math>p_F</math> is the maximal value <math>\alpha^*</math> that fulfilles <math>\max\limits_{p \in [0, 1]}P(p_F \le \alpha^*) \le \alpha</math>. The critical value <math>\alpha^*</math> equals the nominal level of Boschloo's original approach.
+The distribution of <math>p_F</math> is determined by the binomial distributions of <math>x_1</math> and <math>x_0</math> and depends on the unknown nuisance parameter <math>p</math>. For a specified significance level <math>\alpha,</math> the [[Critical value (statistics)|critical value]] of <math>p_F</math> is the maximal value <math>\alpha^*</math> that satisfies <math>\max\limits_{p \in [0, 1]}P(p_F \le \alpha^*) \le \alpha</math>. The critical value <math>\alpha^*</math> is equal to the nominal level of Boschloo's original approach.
 == Modification ==
-Boschloo's test deals with the unknown nuisance parameter <math>p</math> by taking the maximum over the whole parameter space <math>[0,1]</math>. The Berger & Boos procedure takes a different approach by maximizing <math>P(p_F \le \alpha^*)</math> over a <math>(1-\gamma)</math> [[Confidence interval|confidence interval]] and adding <math>\gamma</math>.<ref name="BergerBoos">{{cite journal | author = Berger, R L, and Boos, D D | year = 1994| title = P Values Maximized Over a Confidence Set for the Nuisance Parameter | url = | journal = Journal of the American Statistical Association | volume = 89| issue = | pages = 1012-1016 | doi = 10.2307/2290928 }}</ref>
+Boschloo's test deals with the unknown nuisance parameter <math>p</math> by taking the maximum over the whole parameter space <math>[0,1]</math>. The Berger & Boos procedure takes a different approach by maximizing <math>P(p_F \le \alpha^*)</math> over a <math>(1-\gamma)</math> [[confidence interval]] of <math>p = p_1 = p_0 </math> and adding <math>\gamma</math>.<ref name="BergerBoos">{{cite journal | author = Berger, R L, and Boos, D D | year = 1994| title = P Values Maximized Over a Confidence Set for the Nuisance Parameter | url = http://www.lib.ncsu.edu/resolver/1840.4/237| journal = Journal of the American Statistical Association | volume = 89| issue = 427| pages = 1012–1016 | doi = 10.2307/2290928 | jstor = 2290928}}</ref>
-<math>\gamma</math> is usually a small value such as 0.001 or 0.0001. This results in a modified Boschloo's test which is also exact.<ref name="Mehrotra">{{cite journal | author = Mehrotra, D V, Chan, I S F, and Berger, R L | year = 2003| title = A cautionary note on exact unconditional inference for a difference between two independent binomial proportions | url = | journal = Biometrics | volume = 59| issue = | pages = 441-450 | doi = 10.1111/1541-0420.00051 }}</ref>
+<math>\gamma</math> is usually a small value such as 0.001 or 0.0001. This results in a modified Boschloo's test which is also exact.<ref name="Mehrotra">{{cite journal | author = Mehrotra, D V, Chan, I S F, and Berger, R L | year = 2003| title = A cautionary note on exact unconditional inference for a difference between two independent binomial proportions | journal = Biometrics | volume = 59| issue = 2| pages = 441–450 | doi = 10.1111/1541-0420.00051 | pmid = 12926729| s2cid = 28556526| doi-access = free}}</ref>
 == Comparison to other exact tests ==
-All [[Exact test|exact tests]] adhere to the specified significance level but can have varying power in different situations. Mehrotra et al. compared the power of some exact tests in different situations. The results regarding Boschloo's test are summarized in the following.
+All [[Exact test|exact tests]] hold the specified significance level but can have varying power in different situations. Mehrotra et al. compared the power of some exact tests in different situations.<ref name="Mehrotra"/> The results regarding Boschloo's test are summarized in the following.
 === Modified Boschloo's test ===
-Boschloo's test und the modified Boschloo's test have similar power in all considered scenarios. In some cases Boschloo's test had slightly more power and in some cases vice versa.
+Boschloo's test and the modified Boschloo's test have similar power in all considered scenarios. Boschloo's test has slightly more power in some cases, and vice versa in some other cases.
-==== Fisher's exact test ===
+=== Fisher's exact test ===
-Boschloo's test is by construction uniformly more powerful than Fisher's exact test. For small sample sizes (e.g. 10 per group) the power difference is significant, ranging from 16 to 20 percentage points in the regarded cases. This difference is smaller for greater sample sizes.
+Boschloo's test is by construction uniformly more powerful than Fisher's exact test. For small sample sizes (e.g. 10 per group) the power difference is large, ranging from 16 to 20 percentage points in the regarded cases. The power difference is smaller for greater sample sizes.
-=== <math>Z</math>-Pooled test ===
+=== Exact Z-Pooled test ===
 This test is based on the test statistic
 :<math>
 Z_P(x_1, x_0) = \frac{\hat p_1 - \hat p_0}{\sqrt{\tilde p(1-\tilde p)(\frac{1}{n_1} + \frac{1}{n_0})}},
 </math>
-where <math>p_i = \frac{x_i}{n_i}</math> are the group event rates and <math>\tilde p = \frac{x_1+x_0}{n_1+n_0}</math> is the pooled event rate.
+where <math>\hat p_i = \frac{x_i}{n_i}</math> are the group event rates and <math>\tilde p = \frac{x_1+x_0}{n_1+n_0}</math> is the pooled event rate.
 The power of this test is similar to that of Boschloo's test in most scenarios. In some cases, the <math>Z</math>-Pooled test has greater power, with differences mostly ranging from 1 to 5 percentage points. In very few cases, the difference goes up to 9 percentage points.
@@ Line 82: / Line 104: @@
 This test can also be modified by the Berger & Boos procedure. However, the resulting test has very similar power to the unmodified test in all scenarios.
-=== <math>Z</math>-Unpooled test ===
+=== Exact Z-Unpooled test ===
 This test is based on the test statistic
 :<math>
 Z_U(x_1, x_0) = \frac{\hat p_1 - \hat p_0}{\sqrt{\frac{\hat p_1(1-\hat p_1)}{n_1} + \frac{\hat p_0(1-\hat p_0)}{n_0}}},
 </math>
-where <math>p_i = \frac{x_i}{n_i}</math> are the group event rates.
+where <math>\hat p_i = \frac{x_i}{n_i}</math> are the group event rates.
-The power of this test is similar to that of Boschloo's test in many scenarios. In some cases, the <math>Z</math>-Unpooled test has greater power, with differences ranging from 1 to 5 percentage points. However, in some other cases, Boschloo's test has significantly more power, with differences up to 68 percentage points.
+The power of this test is similar to that of Boschloo's test in many scenarios. In some cases, the <math>Z</math>-Unpooled test has greater power, with differences ranging from 1 to 5 percentage points. However, in some other cases, Boschloo's test has noticeably greater power, with differences up to 68 percentage points.
-This test can also be modified by the Berger & Boos procedure. However, the resulting test has similar power to the unmodified test in most scenarios. In some cases, the power is considerably improved by the modification but the overall power comparison to Boschloo's test remains unchanged.
+This test can also be modified by the Berger & Boos procedure. The resulting test has similar power to the unmodified test in most scenarios. In some cases, the power is considerably improved by the modification but the overall power comparison to Boschloo's test remains unchanged.
+== Software ==
+The calculation of Boschloo's test can be performed in following software:
+* The function ''scipy.stats.boschloo_exact'' from [[SciPy]]
+* Packages ''Exact'' and ''exact2x2'' of the programming language [[R (programming language)|R]]
+* [[StatXact]]
 == See also ==
@@ Line 98: / Line 126: @@
 == References ==
-<!--- See http://en.wikipedia.org/wiki/Wikipedia:Footnotes on how to create references using <ref></ref> tags, these references will then appear here automatically -->
+<!--- See http://en.wikipedia.org/wiki/Wikipedia:Footnotes on how to create references using<ref></ref> tags, these references will then appear here automatically -->
 {{Reflist}}
-== External links ==
-* [http://www.example.com www.example.com]
 <!--- Categories --->
+[[Category:Statistical tests for contingency tables]]
+[[Category:Nonparametric statistics]]