Fisher transformation: Difference between revisions
Undid revision 534460716 by 203.148.162.128 (talk) |
→Discussion: clarification on what is actually proved in the hawkins paper |
||
Line 30: | Line 30: | ||
The behavior of this transform has been extensively studied since [[Ronald Fisher|Fisher]] introduced it in 1915. Fisher himself found the exact distribution of ''z'' for data from a bivariate normal distribution in 1921; Gayen, 1951<ref>{{cite journal | last=Gayen | first=A.K. |title=The Frequency Distribution of the Product-Moment Correlation Coefficient in Random Samples of Any Size Drawn from Non-Normal Universes | volume=38 | year=1951 | pages=219–247 | journal=Biometrika | jstor=2332329 | issue=1/2 | publisher=Biometrika Trust}}</ref> |
The behavior of this transform has been extensively studied since [[Ronald Fisher|Fisher]] introduced it in 1915. Fisher himself found the exact distribution of ''z'' for data from a bivariate normal distribution in 1921; Gayen, 1951<ref>{{cite journal | last=Gayen | first=A.K. |title=The Frequency Distribution of the Product-Moment Correlation Coefficient in Random Samples of Any Size Drawn from Non-Normal Universes | volume=38 | year=1951 | pages=219–247 | journal=Biometrika | jstor=2332329 | issue=1/2 | publisher=Biometrika Trust}}</ref> |
||
determined the exact distribution of ''z'' for data from a bivariate Type A [[Edgeworth distribution]]. [[Harold Hotelling|Hotelling]] in 1953 calculated the Taylor series expressions for the moments of ''z'' and several related statistics<ref>{{cite journal |authorlink=Harold Hotelling | last=Hotelling | first=H | year=1953 | title=New light on the correlation coefficient and its transforms | journal=Journal of the Royal Statistical Society B | volume=15 | pages=193–225 | jstor=2983768 |issue=2 |publisher=Blackwell Publishing}}</ref> and Hawkins in 1989 discovered the asymptotic distribution of ''z'' for |
determined the exact distribution of ''z'' for data from a bivariate Type A [[Edgeworth distribution]]. [[Harold Hotelling|Hotelling]] in 1953 calculated the Taylor series expressions for the moments of ''z'' and several related statistics<ref>{{cite journal |authorlink=Harold Hotelling | last=Hotelling | first=H | year=1953 | title=New light on the correlation coefficient and its transforms | journal=Journal of the Royal Statistical Society B | volume=15 | pages=193–225 | jstor=2983768 |issue=2 |publisher=Blackwell Publishing}}</ref> and Hawkins in 1989 discovered the asymptotic distribution of ''z'' for data from a distribution with bounded fourth moments.<ref>{{cite journal | last=Hawkins | first=D.L. | year=1989 | title=Using [[u-statistic|U statistics]] to derive the asymptotic distribution of Fisher's Z statistic | journal=[[The American Statistician]] | volume=43 | pages=235–237 | doi=10.2307/2685369 | issue=4 | publisher=American Statistical Association | jstor=2685369}}</ref> |
||
==Other uses== |
==Other uses== |
Revision as of 11:11, 27 March 2013
In statistics, hypotheses about the value of the population correlation coefficient ρ between variables X and Y can be tested using the Fisher transformation [1][2] applied to the sample correlation coefficient r.
Definition
The transformation is defined by:
where "ln" is the natural logarithm function and "artanh" is the inverse hyperbolic function.
If (X, Y) has a bivariate normal distribution, and if the (Xi, Yi) pairs used to form r are independent for i = 1, ..., n, then z is approximately normally distributed with mean
and standard error
where N is the sample size.
This transformation, and its inverse,
can be used to construct a confidence interval for ρ.
Discussion
The Fisher transformation is an approximate variance-stabilizing transformation for r when X and Y follow a bivariate normal distribution. This means that the variance of z is approximately constant for all values of the population correlation coefficient ρ. Without the Fisher transformation, the variance of r grows smaller as |ρ| gets closer to 1. Since the Fisher transformation is approximately the identity function when |r| < 1/2, it is sometimes useful to remember that the variance of r is well approximated by 1/N as long as |ρ| is not too large and N is not too small. This is related to the fact that the asymptotic variance of r is 1 for bivariate normal data.
The behavior of this transform has been extensively studied since Fisher introduced it in 1915. Fisher himself found the exact distribution of z for data from a bivariate normal distribution in 1921; Gayen, 1951[3] determined the exact distribution of z for data from a bivariate Type A Edgeworth distribution. Hotelling in 1953 calculated the Taylor series expressions for the moments of z and several related statistics[4] and Hawkins in 1989 discovered the asymptotic distribution of z for data from a distribution with bounded fourth moments.[5]
Other uses
While the Fisher transformation is mainly associated with the Pearson product-moment correlation coefficient for bivariate normal observations, it can also be applied to Spearman's rank correlation coefficient in more general cases. A similar result for the asymptotic distribution applies, but with a minor adjustment factor: see the latter article for details.
References
- ^ Fisher, R.A. (1915). "Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population". Biometrika. 10 (4). Biometrika Trust: 507–521. JSTOR 2331838.
- ^ Fisher, R.A. (1921). "On the `probable error' of a coefficient of correlation deduced from a small sample" (PDF). Metron. 1: 3–32.
- ^ Gayen, A.K. (1951). "The Frequency Distribution of the Product-Moment Correlation Coefficient in Random Samples of Any Size Drawn from Non-Normal Universes". Biometrika. 38 (1/2). Biometrika Trust: 219–247. JSTOR 2332329.
- ^ Hotelling, H (1953). "New light on the correlation coefficient and its transforms". Journal of the Royal Statistical Society B. 15 (2). Blackwell Publishing: 193–225. JSTOR 2983768.
- ^ Hawkins, D.L. (1989). "Using U statistics to derive the asymptotic distribution of Fisher's Z statistic". The American Statistician. 43 (4). American Statistical Association: 235–237. doi:10.2307/2685369. JSTOR 2685369.