Jump to content

Correlation coefficient: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Reverted edits by 196.97.19.132 (talk) (HG) (3.4.12)
 
(12 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{short description|Numerical measure of a statistical relationship between variables}}
{{short description|Numerical measure of a statistical relationship between variables}}
A '''correlation coefficient''' is a [[numerical measure]] of some type of [[correlation and dependence|correlation]], meaning a statistical relationship between two [[variable (mathematics)|variables]].{{efn|Correlation coefficient: A statistic used to show how the scores from one measure relate to scores on a second measure for the same group of individuals. A high value (approaching +1.00) is a strong direct relationship, values near 0.50 are considered moderate and values below 0.30 are considered to show weak relationship. A low negative value (approaching -1.00) is similarly a strong inverse relationship, and values near 0.00 indicate little, if any, relationship.<ref>{{cite web |url=http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorC |title=correlation coefficient |author=<!--Not stated--> |website=NCME.org |publisher=[[National Council on Measurement in Education]] |access-date=April 17, 2014 |archive-url=https://web.archive.org/web/20170722194028/http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorC |archive-date=July 22, 2017 |url-status=dead}}</ref>}} The variables may be two [[column (database)|column]]s of a given [[data set]] of observations, often called a [[sample (statistics)|sample]], or two components of a [[multivariate random variable]] with a known [[distribution (statistics)|distribution]].{{citation needed|date=July 2019}}
A '''correlation coefficient''' is a [[numerical measure]] of some type of '''linear''' [[correlation and dependence|correlation]], meaning a statistical relationship between two [[variable (mathematics)|variables]].{{efn|Correlation coefficient: A [[statistic]] used to show how the scores from one measure relate to scores on a second measure for the same group of individuals. A high value (approaching +1.00) is a strong direct relationship, values near 0.50 are considered moderate and values below 0.30 are considered to show weak relationship. A low negative value (approaching -1.00) is similarly a strong inverse relationship, and values near 0.00 indicate little, if any, relationship.<ref>{{cite web |url=http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorC |title=correlation coefficient |author=<!--Not stated--> |website=NCME.org |publisher=[[National Council on Measurement in Education]] |access-date=April 17, 2014 |archive-url=https://web.archive.org/web/20170722194028/http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorC |archive-date=July 22, 2017 |url-status=dead}}</ref>}} The variables may be two [[column (database)|column]]s of a given [[data set]] of observations, often called a [[sample (statistics)|sample]], or two components of a [[multivariate random variable]] with a known [[distribution (statistics)|distribution]].{{citation needed|date=July 2019}}


Several types of correlation coefficient exist, each with their own definition and own range of usability and characteristics. They all assume values in the range from −1 to +1, where ±1 indicates the strongest possible agreement and 0 the strongest possible disagreement.<ref>{{cite book |last1=Taylor |first1=John R. |title=An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements |date=1997 |publisher=University Science Books |location=Sausalito, CA |isbn=0-935702-75-X |page=217 |edition=2nd |url=http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |access-date=14 February 2019 |archive-url=https://web.archive.org/web/20190215050550/http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |archive-date=15 February 2019 |url-status=dead }}</ref> As tools of analysis, correlation coefficients present certain problems, including the propensity of some types to be distorted by [[outliers]] and the possibility of incorrectly being used to infer a [[causal relationship]] between the variables (for more, see [[Correlation does not imply causation]]).<ref name="Boddy">{{cite book |last1=Boddy |first1=Richard |last2=Smith |first2=Gordon |title=Statistical Methods in Practice: For scientists and technologists |date=2009 |publisher=Wiley |location=Chichester, U.K. |isbn=978-0-470-74664-6 |pages=95–96}}</ref>
Several types of correlation coefficient exist, each with their own definition and own range of usability and characteristics. They all assume values in the range from −1 to +1, where ±1 indicates the strongest possible correlation and 0 indicates no correlation.<ref>{{cite book |last1=Taylor |first1=John R. |title=An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements |date=1997 |publisher=University Science Books |location=Sausalito, CA |isbn=0-935702-75-X |page=217 |edition=2nd |url=http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |access-date=14 February 2019 |archive-url=https://web.archive.org/web/20190215050550/http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |archive-date=15 February 2019 |url-status=dead }}</ref> As tools of analysis, correlation coefficients present certain problems, including the propensity of some types to be distorted by [[outliers]] and the possibility of incorrectly being used to infer a [[causal relationship]] between the variables (for more, see [[Correlation does not imply causation]]).<ref name="Boddy">{{cite book |last1=Boddy |first1=Richard |last2=Smith |first2=Gordon |title=Statistical Methods in Practice: For scientists and technologists |date=2009 |publisher=Wiley |location=Chichester, U.K. |isbn=978-0-470-74664-6 |pages=95–96}}</ref>


==Types==
==Types==
There are several different measures for the degree of correlation in data, depending on the kind of data: principally whether the data is a measurement, ordinal, or categorical.
There are several different measures for the degree of correlation in data, depending on the kind of data: principally whether the data is a measurement, [[Ordinal data|ordinal]], or [[Categorical data|categorical]].


=== Pearson ===
=== Pearson ===
Line 26: Line 26:
# The two continuous variables followed a [[multivariate normal distribution|bivariate normal distribution]].
# The two continuous variables followed a [[multivariate normal distribution|bivariate normal distribution]].


When both variables are dichotomous instead of ordered-categorical, the [[polychoric correlation]] coefficient is called the tetrachoric correlation coefficient.
When both variables are [[Dichotomous variable|dichotomous]] instead of ordered-categorical, the [[polychoric correlation]] coefficient is called the tetrachoric correlation coefficient.

===Interpreting correlation coefficient values===

The correlation between two variables have different associations that are measured in values such as {{mvar|r}} or {{mvar|R}}. Correlation values range from −1 to +1, where ±1 indicates the strongest possible correlation and 0 indicates no correlation between variables.<ref>{{cite book |last1=Taylor |first1=John R. |title=An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements |date=1997 |publisher=University Science Books |location=Sausalito, CA |isbn=0-935702-75-X |page=217 |edition=2nd |url=http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |access-date=14 February 2019 |archive-url=https://web.archive.org/web/20190215050550/http://faculty.kfupm.edu.sa/phys/aanaqvi/Taylor-An%20Introduction%20to%20Error%20Analysis.pdf |archive-date=15 February 2019 |url-status=dead }}</ref>

{| class="wikitable"
|-
! {{mvar|r}} or {{mvar|R}} !! {{mvar|r}} or {{mvar|R}} !! Strength or weakness of association between variables<ref>{{cite web |title=The Correlation Coefficient (r) |url=https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module9-Correlation-Regression/PH717-Module9-Correlation-Regression4.html |website=Boston University}}</ref>
|-
| +1.0 to +0.8 || -1.0 to -0.8 || Perfect or very strong association
|-
| +0.8 to +0.6 || -0.8 to -0.6 || Strong association
|-
| +0.6 to +0.4 || -0.6 to -0.4 || Moderate association
|-
| +0.4 to +0.2 || -0.4 to -0.2 || Weak association
|-
| +0.2 to 0.0 || -0.2 to 0.0 || Very weak or no association
|}


==See also==
==See also==

Latest revision as of 20:58, 28 November 2024

A correlation coefficient is a numerical measure of some type of linear correlation, meaning a statistical relationship between two variables.[a] The variables may be two columns of a given data set of observations, often called a sample, or two components of a multivariate random variable with a known distribution.[citation needed]

Several types of correlation coefficient exist, each with their own definition and own range of usability and characteristics. They all assume values in the range from −1 to +1, where ±1 indicates the strongest possible correlation and 0 indicates no correlation.[2] As tools of analysis, correlation coefficients present certain problems, including the propensity of some types to be distorted by outliers and the possibility of incorrectly being used to infer a causal relationship between the variables (for more, see Correlation does not imply causation).[3]

Types

[edit]

There are several different measures for the degree of correlation in data, depending on the kind of data: principally whether the data is a measurement, ordinal, or categorical.

Pearson

[edit]

The Pearson product-moment correlation coefficient, also known as r, R, or Pearson's r, is a measure of the strength and direction of the linear relationship between two variables that is defined as the covariance of the variables divided by the product of their standard deviations.[4] This is the best-known and most commonly used type of correlation coefficient. When the term "correlation coefficient" is used without further qualification, it usually refers to the Pearson product-moment correlation coefficient.

Intra-class

[edit]

Intraclass correlation (ICC) is a descriptive statistic that can be used, when quantitative measurements are made on units that are organized into groups; it describes how strongly units in the same group resemble each other.

Rank

[edit]

Rank correlation is a measure of the relationship between the rankings of two variables, or two rankings of the same variable:

Tetrachoric and polychoric

[edit]

The polychoric correlation coefficient measures association between two ordered-categorical variables. It's technically defined as the estimate of the Pearson correlation coefficient one would obtain if:

  1. The two variables were measured on a continuous scale, instead of as ordered-category variables.
  2. The two continuous variables followed a bivariate normal distribution.

When both variables are dichotomous instead of ordered-categorical, the polychoric correlation coefficient is called the tetrachoric correlation coefficient.

Interpreting correlation coefficient values

[edit]

The correlation between two variables have different associations that are measured in values such as r or R. Correlation values range from −1 to +1, where ±1 indicates the strongest possible correlation and 0 indicates no correlation between variables.[5]

r or R r or R Strength or weakness of association between variables[6]
+1.0 to +0.8 -1.0 to -0.8 Perfect or very strong association
+0.8 to +0.6 -0.8 to -0.6 Strong association
+0.6 to +0.4 -0.6 to -0.4 Moderate association
+0.4 to +0.2 -0.4 to -0.2 Weak association
+0.2 to 0.0 -0.2 to 0.0 Very weak or no association

See also

[edit]

Notes

[edit]
  1. ^ Correlation coefficient: A statistic used to show how the scores from one measure relate to scores on a second measure for the same group of individuals. A high value (approaching +1.00) is a strong direct relationship, values near 0.50 are considered moderate and values below 0.30 are considered to show weak relationship. A low negative value (approaching -1.00) is similarly a strong inverse relationship, and values near 0.00 indicate little, if any, relationship.[1]

References

[edit]
  1. ^ "correlation coefficient". NCME.org. National Council on Measurement in Education. Archived from the original on July 22, 2017. Retrieved April 17, 2014.
  2. ^ Taylor, John R. (1997). An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements (PDF) (2nd ed.). Sausalito, CA: University Science Books. p. 217. ISBN 0-935702-75-X. Archived from the original (PDF) on 15 February 2019. Retrieved 14 February 2019.
  3. ^ Boddy, Richard; Smith, Gordon (2009). Statistical Methods in Practice: For scientists and technologists. Chichester, U.K.: Wiley. pp. 95–96. ISBN 978-0-470-74664-6.
  4. ^ Weisstein, Eric W. "Statistical Correlation". mathworld.wolfram.com. Retrieved 2020-08-22.
  5. ^ Taylor, John R. (1997). An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements (PDF) (2nd ed.). Sausalito, CA: University Science Books. p. 217. ISBN 0-935702-75-X. Archived from the original (PDF) on 15 February 2019. Retrieved 14 February 2019.
  6. ^ "The Correlation Coefficient (r)". Boston University.