Correlation coefficient: Difference between revisions
2 col ref layout |
|||
(32 intermediate revisions by 21 users not shown) | |||
Line 1: | Line 1: | ||
{{short description|Numerical measure of a statistical relationship between variables}} |
{{short description|Numerical measure of a statistical relationship between variables}} |
||
A '''correlation coefficient''' is a [[numerical measure]] of some type of [[correlation and dependence|correlation]], meaning a statistical relationship between two [[variable (mathematics)|variables]]. |
A '''correlation coefficient''' is a [[numerical measure]] of some type of '''linear''' [[correlation and dependence|correlation]], meaning a statistical relationship between two [[variable (mathematics)|variables]].{{efn|Correlation coefficient: A [[statistic]] used to show how the scores from one measure relate to scores on a second measure for the same group of individuals. A high value (approaching +1.00) is a strong direct relationship, values near 0.50 are considered moderate and values below 0.30 are considered to show weak relationship. A low negative value (approaching -1.00) is similarly a strong inverse relationship, and values near 0.00 indicate little, if any, relationship.<ref>{{cite web |url=http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorC |title=correlation coefficient |author=<!--Not stated--> |website=NCME.org |publisher=[[National Council on Measurement in Education]] |access-date=April 17, 2014 |archive-url=https://web.archive.org/web/20170722194028/http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorC |archive-date=July 22, 2017 |url-status=dead}}</ref>}} The variables may be two [[column (database)|column]]s of a given [[data set]] of observations, often called a [[sample (statistics)|sample]], or two components of a [[multivariate random variable]] with a known [[distribution (statistics)|distribution]].{{citation needed|date=July 2019}} |
||
Several types of correlation coefficient exist, each with their own definition and own range of usability and characteristics. They all assume values in the range from −1 to +1, where ±1 indicates the strongest possible |
Several types of correlation coefficient exist, each with their own definition and own range of usability and characteristics. They all assume values in the range from −1 to +1, where ±1 indicates the strongest possible correlation and 0 indicates no correlation.<ref>{{cite book |last1=Taylor |first1=John R. |title=An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements |date=1997 |publisher=University Science Books |location=Sausalito, CA |isbn=0-935702-75-X |page=217 |edition=2nd |url=http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |access-date=14 February 2019 |archive-url=https://web.archive.org/web/20190215050550/http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |archive-date=15 February 2019 |url-status=dead }}</ref> As tools of analysis, correlation coefficients present certain problems, including the propensity of some types to be distorted by [[outliers]] and the possibility of incorrectly being used to infer a [[causal relationship]] between the variables (for more, see [[Correlation does not imply causation]]).<ref name="Boddy">{{cite book |last1=Boddy |first1=Richard |last2=Smith |first2=Gordon |title=Statistical Methods in Practice: For scientists and technologists |date=2009 |publisher=Wiley |location=Chichester, U.K. |isbn=978-0-470-74664-6 |pages=95–96}}</ref> |
||
==Types== |
==Types== |
||
There are several different measures for the degree of correlation in data, depending on the kind of data: principally whether the data is a measurement, [[Ordinal data|ordinal]], or [[Categorical data|categorical]]. |
|||
=== Pearson === |
=== Pearson === |
||
The [[Pearson product-moment correlation coefficient]], also known as |
The [[Pearson product-moment correlation coefficient]], also known as {{mvar|r}}, {{mvar|R}}, or ''Pearson's'' {{mvar|r}}, is a measure of the strength and direction of the ''linear'' relationship between two variables that is defined as the [[covariance]] of the variables divided by the product of their standard deviations.<ref>{{Cite web|last=Weisstein|first=Eric W.|title=Statistical Correlation|url=https://mathworld.wolfram.com/StatisticalCorrelation.html|access-date=2020-08-22|website=mathworld.wolfram.com|language=en}}</ref> This is the best-known and most commonly used type of correlation coefficient. When the term "correlation coefficient" is used without further qualification, it usually refers to the Pearson product-moment correlation coefficient. |
||
=== Intra-class === |
=== Intra-class === |
||
[[Intraclass correlation]] (ICC) is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups; it describes how strongly units in the same group resemble each other. |
[[Intraclass correlation]] (ICC) is a descriptive statistic that can be used, when quantitative measurements are made on units that are organized into groups; it describes how strongly units in the same group resemble each other. |
||
=== Rank === |
=== Rank === |
||
[[Rank correlation]] is a measure of the relationship between the rankings of two variables or two rankings of the same variable: |
[[Rank correlation]] is a measure of the relationship between the rankings of two variables, or two rankings of the same variable: |
||
*[[Spearman's rank correlation coefficient]] is a measure of how well the relationship between two variables can be described by a monotonic function. |
*[[Spearman's rank correlation coefficient]] is a measure of how well the relationship between two variables can be described by a monotonic function. |
||
*The [[Kendall tau rank correlation coefficient]] is a measure of the portion of ranks that match between two data sets. |
*The [[Kendall tau rank correlation coefficient]] is a measure of the portion of ranks that match between two data sets. |
||
*[[Goodman and Kruskal's gamma]] is a measure of the strength of association of the cross tabulated data when both variables are measured at the ordinal level. |
*[[Goodman and Kruskal's gamma]] is a measure of the strength of association of the cross tabulated data when both variables are measured at the ordinal level. |
||
=== Tetrachoric and |
=== Tetrachoric and polychoric === |
||
The [[polychoric correlation]] coefficient measures association between two ordered-categorical variables. |
The [[polychoric correlation]] coefficient measures association between two ordered-categorical variables. It's technically defined as the estimate of the Pearson correlation coefficient one would obtain if: |
||
# The two variables were measured on a continuous scale, instead of as ordered-category variables. |
|||
# The two continuous variables followed a [[multivariate normal distribution|bivariate normal distribution]]. |
|||
When both variables are [[Dichotomous variable|dichotomous]] instead of ordered-categorical, the [[polychoric correlation]] coefficient is called the tetrachoric correlation coefficient. |
|||
===Interpreting correlation coefficient values=== |
|||
The correlation between two variables have different associations that are measured in values such as {{mvar|r}} or {{mvar|R}}. Correlation values range from −1 to +1, where ±1 indicates the strongest possible correlation and 0 indicates no correlation between variables.<ref>{{cite book |last1=Taylor |first1=John R. |title=An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements |date=1997 |publisher=University Science Books |location=Sausalito, CA |isbn=0-935702-75-X |page=217 |edition=2nd |url=http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |access-date=14 February 2019 |archive-url=https://web.archive.org/web/20190215050550/http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |archive-date=15 February 2019 |url-status=dead }}</ref> |
|||
{| class="wikitable" |
|||
|- |
|||
! {{mvar|r}} or {{mvar|R}} !! {{mvar|r}} or {{mvar|R}} !! Strength or weakness of association between variables<ref>{{cite web |title=The Correlation Coefficient (r) |url=https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module9-Correlation-Regression/PH717-Module9-Correlation-Regression4.html |website=Boston University}}</ref> |
|||
|- |
|||
| +1.0 to +0.8 || -1.0 to -0.8 || Perfect or very strong association |
|||
|- |
|||
| +0.8 to +0.6 || -0.8 to -0.6 || Strong association |
|||
|- |
|||
| +0.6 to +0.4 || -0.6 to -0.4 || Moderate association |
|||
|- |
|||
| +0.4 to +0.2 || -0.4 to -0.2 || Weak association |
|||
|- |
|||
| +0.2 to 0.0 || -0.2 to 0.0 || Very weak or no association |
|||
|} |
|||
==See also== |
==See also== |
||
*[[Correlation disattenuation]] |
|||
⚫ | |||
*[[Correlation and dependence]] |
|||
*[[Correlation ratio]] |
|||
*[[Distance correlation]] |
*[[Distance correlation]] |
||
*[[Goodness of fit]], any of several measures that measure how well a statistical model fits observations by summarizing the discrepancy between observed values and the values expected under the model |
*[[Goodness of fit]], any of several measures that measure how well a statistical model fits observations by summarizing the discrepancy between observed values and the values expected under the model |
||
*[[Multiple correlation]] |
|||
⚫ | |||
*[[Partial correlation]] |
*[[Partial correlation]] |
||
==Notes== |
|||
{{notelist|1}} |
|||
==References== |
==References== |
||
Line 33: | Line 66: | ||
{{Portal bar|Mathematics}} |
{{Portal bar|Mathematics}} |
||
[[Category:Correlation indicators]] |
|||
[[Category:Mathematical terminology]] |
[[Category:Mathematical terminology]] |
||
[[Category:Covariance and correlation]] |
Latest revision as of 20:58, 28 November 2024
A correlation coefficient is a numerical measure of some type of linear correlation, meaning a statistical relationship between two variables.[a] The variables may be two columns of a given data set of observations, often called a sample, or two components of a multivariate random variable with a known distribution.[citation needed]
Several types of correlation coefficient exist, each with their own definition and own range of usability and characteristics. They all assume values in the range from −1 to +1, where ±1 indicates the strongest possible correlation and 0 indicates no correlation.[2] As tools of analysis, correlation coefficients present certain problems, including the propensity of some types to be distorted by outliers and the possibility of incorrectly being used to infer a causal relationship between the variables (for more, see Correlation does not imply causation).[3]
Types
[edit]There are several different measures for the degree of correlation in data, depending on the kind of data: principally whether the data is a measurement, ordinal, or categorical.
Pearson
[edit]The Pearson product-moment correlation coefficient, also known as r, R, or Pearson's r, is a measure of the strength and direction of the linear relationship between two variables that is defined as the covariance of the variables divided by the product of their standard deviations.[4] This is the best-known and most commonly used type of correlation coefficient. When the term "correlation coefficient" is used without further qualification, it usually refers to the Pearson product-moment correlation coefficient.
Intra-class
[edit]Intraclass correlation (ICC) is a descriptive statistic that can be used, when quantitative measurements are made on units that are organized into groups; it describes how strongly units in the same group resemble each other.
Rank
[edit]Rank correlation is a measure of the relationship between the rankings of two variables, or two rankings of the same variable:
- Spearman's rank correlation coefficient is a measure of how well the relationship between two variables can be described by a monotonic function.
- The Kendall tau rank correlation coefficient is a measure of the portion of ranks that match between two data sets.
- Goodman and Kruskal's gamma is a measure of the strength of association of the cross tabulated data when both variables are measured at the ordinal level.
Tetrachoric and polychoric
[edit]The polychoric correlation coefficient measures association between two ordered-categorical variables. It's technically defined as the estimate of the Pearson correlation coefficient one would obtain if:
- The two variables were measured on a continuous scale, instead of as ordered-category variables.
- The two continuous variables followed a bivariate normal distribution.
When both variables are dichotomous instead of ordered-categorical, the polychoric correlation coefficient is called the tetrachoric correlation coefficient.
Interpreting correlation coefficient values
[edit]The correlation between two variables have different associations that are measured in values such as r or R. Correlation values range from −1 to +1, where ±1 indicates the strongest possible correlation and 0 indicates no correlation between variables.[5]
r or R | r or R | Strength or weakness of association between variables[6] |
---|---|---|
+1.0 to +0.8 | -1.0 to -0.8 | Perfect or very strong association |
+0.8 to +0.6 | -0.8 to -0.6 | Strong association |
+0.6 to +0.4 | -0.6 to -0.4 | Moderate association |
+0.4 to +0.2 | -0.4 to -0.2 | Weak association |
+0.2 to 0.0 | -0.2 to 0.0 | Very weak or no association |
See also
[edit]- Correlation disattenuation
- Coefficient of determination
- Correlation and dependence
- Correlation ratio
- Distance correlation
- Goodness of fit, any of several measures that measure how well a statistical model fits observations by summarizing the discrepancy between observed values and the values expected under the model
- Multiple correlation
- Partial correlation
Notes
[edit]- ^ Correlation coefficient: A statistic used to show how the scores from one measure relate to scores on a second measure for the same group of individuals. A high value (approaching +1.00) is a strong direct relationship, values near 0.50 are considered moderate and values below 0.30 are considered to show weak relationship. A low negative value (approaching -1.00) is similarly a strong inverse relationship, and values near 0.00 indicate little, if any, relationship.[1]
References
[edit]- ^ "correlation coefficient". NCME.org. National Council on Measurement in Education. Archived from the original on July 22, 2017. Retrieved April 17, 2014.
- ^ Taylor, John R. (1997). An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements (PDF) (2nd ed.). Sausalito, CA: University Science Books. p. 217. ISBN 0-935702-75-X. Archived from the original (PDF) on 15 February 2019. Retrieved 14 February 2019.
- ^ Boddy, Richard; Smith, Gordon (2009). Statistical Methods in Practice: For scientists and technologists. Chichester, U.K.: Wiley. pp. 95–96. ISBN 978-0-470-74664-6.
- ^ Weisstein, Eric W. "Statistical Correlation". mathworld.wolfram.com. Retrieved 2020-08-22.
- ^ Taylor, John R. (1997). An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements (PDF) (2nd ed.). Sausalito, CA: University Science Books. p. 217. ISBN 0-935702-75-X. Archived from the original (PDF) on 15 February 2019. Retrieved 14 February 2019.
- ^ "The Correlation Coefficient (r)". Boston University.