Jump to content

Wilcoxon signed-rank test: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
added names, alternate hyp
Kastchei (talk | contribs)
Fixed major problems with the test procedure and example. Especially the test itself which rejects if the statistic is GREATER than the critical value, not less. Removed Confidence Interval as the information is clearly incorrect.
Line 4: Line 4:


The test is named for [[Frank Wilcoxon]] (1892&ndash;1965) who, in a single paper, proposed both it and the [[Mann-Whitney-Wilcoxon test|rank-sum test]] for two independent samples (Wilcoxon, 1945).<ref>{{cite journal|last=Wilcoxon|first=Frank|title=Individual comparisons by ranking methods|journal=Biometrics Bulletin|year=1945|month=Dec|volume=1|issue=6|pages=80–83|url=http://sci2s.ugr.es/keel/pdf/algorithm/articulo/wilcoxon1945.pdf}}</ref> The test was popularized by [[Sidney Siegel|Siegel]] (1956)<ref>{{cite book|last=Siegel|first=Sidney|title=Non-parametric statistics for the behavioral sciences|year=1956|publisher=McGraw-Hill|location=New York|pages=75–83|url=http://books.google.com/books?ei=9cWLTfaTIcmEOs_NuM0L&ct=result&id=ebfRAAAAMAAJ&dq=Wilcoxon+statistics+for+the+behavioral+sciences+Non-parametric&q=Wilcoxon#search_anchor}}</ref> in his influential text book on non-parametric statistics. Siegel used the symbol ''T'' for the value defined below as ''S''. In consequence, the test is sometimes referred to as the '''Wilcoxon ''T'' test''', and the test statistic is reported as a value of ''T''. Other names may include the 't-test for matched pairs' or the 't-test for dependent samples'.
The test is named for [[Frank Wilcoxon]] (1892&ndash;1965) who, in a single paper, proposed both it and the [[Mann-Whitney-Wilcoxon test|rank-sum test]] for two independent samples (Wilcoxon, 1945).<ref>{{cite journal|last=Wilcoxon|first=Frank|title=Individual comparisons by ranking methods|journal=Biometrics Bulletin|year=1945|month=Dec|volume=1|issue=6|pages=80–83|url=http://sci2s.ugr.es/keel/pdf/algorithm/articulo/wilcoxon1945.pdf}}</ref> The test was popularized by [[Sidney Siegel|Siegel]] (1956)<ref>{{cite book|last=Siegel|first=Sidney|title=Non-parametric statistics for the behavioral sciences|year=1956|publisher=McGraw-Hill|location=New York|pages=75–83|url=http://books.google.com/books?ei=9cWLTfaTIcmEOs_NuM0L&ct=result&id=ebfRAAAAMAAJ&dq=Wilcoxon+statistics+for+the+behavioral+sciences+Non-parametric&q=Wilcoxon#search_anchor}}</ref> in his influential text book on non-parametric statistics. Siegel used the symbol ''T'' for the value defined below as ''S''. In consequence, the test is sometimes referred to as the '''Wilcoxon ''T'' test''', and the test statistic is reported as a value of ''T''. Other names may include the 't-test for matched pairs' or the 't-test for dependent samples'.

==Setup==
Suppose we collect 2''n'' observations, two observations of each of the ''n'' subjects. Let ''i'' denote the particular subject that is being referred to and the first observation measured on subject ''i'' be denoted by <math>x_i</math> and second observation be <math>y_i</math>. For each ''i'' in the observations, <math>x_i</math> and <math>y_i</math> should be paired together.


==Assumptions==
==Assumptions==
# Data is paired and comes from the same population.
Let ''Z''<sub>i</sub>&nbsp;=&nbsp;''X''<sub>i</sub>&nbsp;&ndash;&nbsp;''Y''<sub>i</sub> for ''i''&nbsp;=&nbsp;1,&nbsp;...&nbsp;,&nbsp;''n'' <ref>Nonparametric statistical methods (second edition), by Myles Hollander and Douglas A. Wolfe. A wiley interscience publication. 1999. page 36</ref>.
# The differences ''Z<sub>i</sub>'' are assumed to be independent.
# Each pair is chosen randomly and independent.
# The data is at least ordinal.
# Each ''Z<sub>i</sub>'' comes from the same continuous population.
# The values which ''X''<sub>i</sub> and ''Y''<sub>i</sub> represent are ordered (at least the ordinal [[level of measurement]]<ref name=lowry/>), so the comparisons "greater than", "less than", and "equal to" are useful. X and Y need to be on scales that make differences orderable, which can mean that X and Y are interval scaled.
# If we wish to make an inference about the mean (or about the median) difference, then we assume the distribution of the differences is symmetric. If we only want to test the hypothesis that the probability that the sum of a randomly chosen pair of differences exceeds zero is 0.5 then no distributional assumption is needed.


==Test procedure==
==Test procedure==
Let ''N'' be the sample size, the number of pairs. Thus, there are a total of ''2N'' data points. For ''i'' = 1, ..., ''N'', let <math>x_{1,i}</math> and <math>x_{2,i}</math> denote the measurements.<br><br>
The [[null hypothesis]] tested is ''H''<sub>0</sub><nowiki>:</nowiki> ''&theta;''&nbsp;=&nbsp;0 & the [[alternate hypothesis]] tested is ''H''<sub>1</sub><nowiki>:</nowiki> ''&theta;''&nbsp;does not equal to&nbsp;0.


<math>H_0: \text{median difference between the pairs is zero}</math>&nbsp;&nbsp;&nbsp;&nbsp;<math>H_1: \text{median difference is not zero}</math>.
# Exclude observations with ''Z<sub>i</sub> = 0''. Let ''m'' be the reduced sample size. (But see the note on [[#Excluding zero differences]] below.)
<br><br>
# Order the absolute values ''|Z<sub>1</sub>|'',&nbsp;...,&nbsp;''|Z<sub>n</sub>|'' in ascending sequence, and let the [[Ranking#Ranking_in_statistics|rank]] of each non-zero ''|Z<sub>i</sub>|'' be ''R<sub>i</sub>'' (the smallest positive ''|Z<sub>i</sub>|'' gets the rank of 1, and a mean rank is assigned to tied scores).
# For ''i'' = 1, ..., ''N'', calculate <math>|x_{2,i} - x_{1,i}|</math> and <math>\sgn(x_{2,i} - x_{1,i})</math>, where sgn is the [[Sign function|sign function]].<br><br>
# Denote the positive ''Z<sub>i</sub>'' values with ''φ<sub>i</sub> = I(Z<sub>i</sub> > 0)'', where ''I''(.) is an [[indicator function]]: ''φ<sub>i</sub> = 1'' for ''Z<sub>i</sub>'' > 0, otherwise ''φ<sub>i</sub> = 0''.
# Exclude pairs with <math>|x_{2,i} - x_{1,i}| = 0</math>. Let <math>N_r</math> be the reduced sample size.<br><br>
# The Wilcoxon signed ranked statistic ''W''<sub>+</sub> is defined as <br> <math>W_+ = \sum_{i=1}^n \phi_i R_i.\,\!</math>
# Order the remaining <math>m</math> pairs from smallest absolute difference to largest absolute difference, <math>|x_{2,i} - x_{1,i}|</math>.<br><br>
# Define ''W<sub>&minus;</sub>'' similarly by summing ranks of the negative differences ''Z<sub>i</sub>''.
# [[Ranking#Ranking_in_statistics|Rank]] the pairs, starting with the smallest as 1. Ties receive a rank equal to the average of the ranks they span. Let <math>R_i</math> denote the rank.<br><br>
# Calculate ''S'' as the smaller of these two rank sums: ''S = min(W<sub>+</sub>, W<sub>&minus;</sub>)''.
# Calculate the test statistic ''W''.<br><br><math>W = |\sum_{i=1}^n [\sgn(x_{2,i} - x_{1,i}) \cdot R_i]|</math>, the absolute value of the sum of the signed ranks.<br><br>
# Find the critical value for the given sample size n (or m?{{Citation needed|date=March 2011}}), and the wanted confidence level.
# As <math>N_r</math> increases, the sampling distribution of ''W'' converges to a normal distribution. Thus,<br><br>For <math>N_r \ge 10</math>, a z-score can be calculated as <math>z = \frac{W - 0.5}{\sigma_W}, \sigma_W = \sqrt{\frac{N_r(N_r + 1)(2N_r + 1)}{6}}</math>. If z > z<sub>critical</sub>, reject ''H<sub>0</sub>''.<br><br>For <math>N_r < 10</math>, <math>W</math> is compared to a critical value from a reference table<ref name=lowry></ref>. If <math>W \ge W_{critical, N_r}</math>, reject ''H<sub>0</sub>''. Alternatively, a p-value can be calculated from enumeration of all possible combinations of <math>W</math> given <math>N_r</math>.<br><br>
#* For samples of a small size the critical value is obtained from a table (which is calculated by considering all possible distributions of ranks to calculate ''p'', the statistical [[probability]] of attaining ''S'' from a population of scores that is symmetrically distributed around the central point)
#* As the number of scores used, ''n'', increases, the distribution of all possible ranks ''S'' tends towards the [[normal distribution]]. So although for ''n''&nbsp;≤&nbsp;20, exact probabilities would usually be calculated, for ''n'' > 20, the normal approximation is used. The recommended cutoff varies from textbook to textbook &mdash; here we use 20 although some put it lower (10) or higher (25).
# Compare ''S'' to the critical value, and reject H<sub>0</sub> if ''S'' is less than or is equal to the critical value.

==Confidence interval for the Wilcoxon signed-rank test==
A median [[confidence interval]] can be constructed based on Wilcoxon signed-rank test for matched pairs.<ref>Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach, New Jersey: Wiley,by Corder, G.W. & Foreman, D.I. (2009)</ref> To create the confidence interval, all possible pairs (X<sub>i</sub>,X<sub>j</sub>) are used to compute the differences D<sub>i</sub>=X<sub>i</sub>-X<sub>j</sub>; then, compute all of the averages, υ<sub>i</sub><sub>j</sub> use:
*υ<sub>i</sub><sub>j</sub>=(D<sub>i</sub>+D<sub>j</sub>)/2

There would be [n(n-1)/2]+n averages. Then arrange all the averages from smallest to largest, and the median of ordered averages gives a point estimate of the population.


==Example==
==Example==
{| border="0" | style="text-align: right;"

|
{| class="wikitable"
{| class="wikitable"
|-
|-
| &nbsp;
! Subject (i)
| &nbsp;
! X<sub>i</sub>
| &nbsp;
! Y<sub>i</sub>
| colspan="2" | <math>x_{2,i} - x_{1,i}</math>
! Sign of X<sub>i</sub>''&nbsp;&ndash;&nbsp;''Y<sub>i</sub>
|-
! X<sub>i</sub>''&nbsp;&ndash;&nbsp;''Y<sub>i</sub>
| <math>i_{}</math>
! Absolute X<sub>i</sub>''&nbsp;&ndash;&nbsp;''Y<sub>i</sub>
| <math>x_{2,i}</math>
! Rank of Absolute
| <math>x_{1,i}</math>
! Signed Rank
| <math>\sgn</math>
| <math>\text{abs}</math>
|-
|-
| 1
| 1
| 125
| 125
| 110
| 110
| +
| 1
| 15
| 15
| 15
| 7
| 7
|-
|-
| 2
| 2
| 115
| 115
| 122
| 122
|
| –1
| –7
| 7
| 7
| 3
| –3
|-
|-
| 3
| 3
| 130
| 130
| 125
| 125
| +
| 1
| 5
| 5
| 5
| 1.5
| 1.5
|-
|-
| 4
| 4
| 140
| 140
| 120
| 120
| +
| 1
| 20
| 20
| 20
| 9
| 9
|-
|-
| 5
| 5
Line 89: Line 67:
| &nbsp;
| &nbsp;
| 0
| 0
| 0
| &nbsp;
| &nbsp;
|-
|-
| 6
| 6
| 115
| 115
| 124
| 124
|
| –1
| –9
| 9
| 9
| 4
| –4
|-
|-
| 7
| 7
| 140
| 140
| 123
| 123
| +
| 1
| 17
| 17
| 17
| 8
| 8
|-
|-
| 8
| 8
| 125
| 125
| 137
| 137
|
| –1
| –12
| 12
| 12
| 6
| –6
|-
|-
| 9
| 9
| 140
| 140
| 135
| 135
| +
| 1
| 5
| 5
|-
| 10
| 135
| 145
| –1
| 10
|}
| style="vertical-align:center;" | order by absolute difference
|
{| class="wikitable"
|-
| &nbsp;
| &nbsp;
| &nbsp;
| style="text-align: center;" colspan="4" | <math>x_{2,i} - x_{1,i}</math>
|-
| <math>i_{}</math>
| <math>x_{2,i}</math>
| <math>x_{1,i}</math>
| <math>\sgn</math>
| <math>\text{abs}</math>
| <math>R_i</math>
| <math>\sgn \cdot R_i</math>
|-
| 5
| 140
| 140
| &nbsp;
| 0
| &nbsp;
| &nbsp;
|-
| 3
| 130
| 125
| 1
| 5
| 5
| 1.5
| 1.5
| 1.5
| 1.5
|-
| 9
| 140
| 135
| 1
| 5
| 1.5
| 1.5
|-
| 2
| 115
| 122
| –1
| 7
| 3
| –3
|-
| 6
| 115
| 124
| –1
| 9
| 4
| –4
|-
|-
| 10
| 10
| 135
| 135
| 145
| 145
|
| –1
| –10
| 10
| 10
| 5
| 5
| –5
| –5
|-
| 8
| 125
| 137
| –1
| 12
| 6
| –6
|-
| 1
| 125
| 110
| 1
| 15
| 7
| 7
|-
| 7
| 140
| 123
| 1
| 17
| 8
| 8
|-
| 4
| 140
| 120
| 1
| 20
| 9
| 9
|}
|}
|}
# The sign of X<sub>i</sub>''&nbsp;&ndash;&nbsp;''Y<sub>i</sub> is denoted in the Sign column by either (+) or (–). If X<sub>i</sub> and Y<sub>i</sub> are equal, then the value is thrown out.
# The values of X<sub>i</sub>''&nbsp;&ndash;&nbsp;''Y<sub>i</sub> are given in the next two columns.
# The last two columns are the ranks. The absolute rank column has no signs, and the signed rank column gives the ranks along with their signs.
# The data is ranked from the smallest value to the largest value. In the case of a tie, ranks are added together and divided by the number of ties. For example, in this data, there were two instances of the value 5. The ranks corresponding to 5 are 1 and 2. The sum of these ranks is 3. After dividing by the number of ties, you get a mean rank of 1.5, and this value is assigned to both instances of 5.
# The test statistic, W<sub>+</sub>, is given by the sum of all of the positive values in the Signed Rank column. The test statistic, W<sub>–</sub>, is given by the sum of all of the negative values in the Signed Rank column. For this example, W<sub>+</sub> = 27 and W<sub>–</sub>=18. The minimum of these is 18.
# Lastly, this test statistic is analyzed using a table of critical values. If the test statistic is less than or equal to the critical value based on the number of observations n, then the null hypothesis is rejected for the alternative hypothesis. Otherwise, the null hypothesis is not rejected. [http://www.sussex.ac.uk/Users/grahamh/RM1web/WilcoxonTable2005.pdf See table here.]


<math>sgn</math> is the sign function, <math>\text{abs}</math> is the absolute value, and <math>R_i</math> is the rank. Notice that pairs 3 and 9 are tied in absolute value. They would be ranked 1 and 2, so each gets the average of those ranks, 1.5.<br><br>
In this case the test statistic is W = 18 and the critical value is 8 for a two-tailed ''p''-value of 0.05. The test statistic must be less than this to be significant at this level, so in this case the null hypothesis is not rejected.
<math>N_r = 10 - 1 = 9, W = |1.5+1.5-3-4-5-6+7+8+9| = 9.</math><br><br>
<math>W < W_{\alpha = 0.05, 9} = 35 \therefore \text{fail to reject} H_0</math>


==See also==
==See also==

Revision as of 02:00, 20 April 2012

The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used when comparing two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e. it's a paired difference test).

It can be used as an alternative to the paired Student's t-test when the population cannot be assumed to be normally distributed or the data is on the ordinal scale.[1]

The test is named for Frank Wilcoxon (1892–1965) who, in a single paper, proposed both it and the rank-sum test for two independent samples (Wilcoxon, 1945).[2] The test was popularized by Siegel (1956)[3] in his influential text book on non-parametric statistics. Siegel used the symbol T for the value defined below as S. In consequence, the test is sometimes referred to as the Wilcoxon T test, and the test statistic is reported as a value of T. Other names may include the 't-test for matched pairs' or the 't-test for dependent samples'.

Assumptions

  1. Data is paired and comes from the same population.
  2. Each pair is chosen randomly and independent.
  3. The data is at least ordinal.

Test procedure

Let N be the sample size, the number of pairs. Thus, there are a total of 2N data points. For i = 1, ..., N, let and denote the measurements.

    .

  1. For i = 1, ..., N, calculate and , where sgn is the sign function.

  2. Exclude pairs with . Let be the reduced sample size.

  3. Order the remaining pairs from smallest absolute difference to largest absolute difference, .

  4. Rank the pairs, starting with the smallest as 1. Ties receive a rank equal to the average of the ranks they span. Let denote the rank.

  5. Calculate the test statistic W.

    , the absolute value of the sum of the signed ranks.

  6. As increases, the sampling distribution of W converges to a normal distribution. Thus,

    For , a z-score can be calculated as . If z > zcritical, reject H0.

    For , is compared to a critical value from a reference table[1]. If , reject H0. Alternatively, a p-value can be calculated from enumeration of all possible combinations of given .

Example

     
1 125 110 1 15
2 115 122 –1 7
3 130 125 1 5
4 140 120 1 20
5 140 140   0
6 115 124 –1 9
7 140 123 1 17
8 125 137 –1 12
9 140 135 1 5
10 135 145 –1 10
order by absolute difference
     
5 140 140   0    
3 130 125 1 5 1.5 1.5
9 140 135 1 5 1.5 1.5
2 115 122 –1 7 3 –3
6 115 124 –1 9 4 –4
10 135 145 –1 10 5 –5
8 125 137 –1 12 6 –6
1 125 110 1 15 7 7
7 140 123 1 17 8 8
4 140 120 1 20 9 9

is the sign function, is the absolute value, and is the rank. Notice that pairs 3 and 9 are tied in absolute value. They would be ranked 1 and 2, so each gets the average of those ranks, 1.5.



See also

  • Mann-Whitney-Wilcoxon test (the variant for two independent samples)
  • Sign test (Like Wilcoxon test, but without the assumption of symmetric distribution of the differences around the median, and without using the magnitude of the difference)

References

  1. ^ a b Lowry, Richard. "Concepts & Applications of Inferential Statistics". Retrieved 24 March 2011.
  2. ^ Wilcoxon, Frank (1945). "Individual comparisons by ranking methods" (PDF). Biometrics Bulletin. 1 (6): 80–83. {{cite journal}}: Unknown parameter |month= ignored (help)
  3. ^ Siegel, Sidney (1956). Non-parametric statistics for the behavioral sciences. New York: McGraw-Hill. pp. 75–83.

Implementations

  • ALGLIB includes implementation of the Wilcoxon signed-rank test in C++, C#, Delphi, Visual Basic, etc.
  • The free statistical software R includes an implementation of the test as wilcox.test(x,y, paired=TRUE), where x and y are vectors of equal length.
  • GNU Octave implements various one-tailed and two-tailed versions of the test in the wilcoxon_test function.
  • SciPy includes an implementation of the Wilcoxon signed-rank test in Python