Discriminant function analysis: Difference between revisions

Content deleted Content added

Inline

Revision as of 20:49, 24 April 2012

Discriminant function analysis is a statistical analysis to predict a categorical dependent variable (called a grouping variable) by one or more continuous or binary independent variables (called predictor variables). The original dichotomous discriminant analysis was developed by Sir Ronald Fisher in 1936 ^[1] It is different from an ANOVA or MANOVA, which is used to predict one (ANOVA) or multiple (MANOVA) continuous dependent variables by one or more independent categorical variables. Discriminant function analysis is useful in determining whether a set of variables is effective in predicting category membership^[2]

Discriminant analysis is used when groups are known a priori (unlike in cluster analysis). Each case must have a score on one or more quantitative predictor measures, and a score on a group measure ^[3]. In simple terms, discriminant function analysis is classification - the act of distributing things into groups, classes or categories of the same type.

Moreover, it is a useful follow-up procedure to a MANOVA instead of doing a series of one-way ANOVAs, for ascertaining how the groups differ on the composite of dependent variables. In this case, a significant F test allows classification based on a linear combination of predictor variables. Terminology can get confusing here, as in MANOVA, the dependent variables are the predictor variables, and the independent variables are the grouping variables ^[2].

Assumptions

The assumptions of discriminant analysis are the same as those for MANOVA. Note, the analysis is quite sensitive to outliers and the n in the smallest group must be larger than the number of predictor variables ^[3].

Multivariate Normality: Independent variables are normal for each level of the grouping variable ^[3]^[2].

Homogeneity of Variance/Covariance (Homoscedasticity): Variances among group variables are the same across levels of predictors. Can be tested with Boxes M statistic ^[2]. It has been suggested, however, that linear discriminant analysis be used when covariances are equal, and that quadratic discriminant analysis may be used when covariances are not equal ^[3].

Multicollinearity: Predictive power can decrease with an increased correlation between predictor variables, since variance is being accounted for twice ^[3].

Independence: Participants are assumed to be randomly sampled, and a participant’s score on one variable is assumed to be independent of scores on that variable for all other participants ^[3]^[2].

It has been suggested that discriminant analysis is relatively robust to slight violations of these assumptions ^[4], and it has also been shown that discriminant analysis may still be reliable when using dichotomous variables (where multivariate normality is often violated) ^[5].

Discriminant Functions

Discriminant analysis works by creating one or more linear combinations of predictors, creating a new latent variable for each function. These functions are called discriminant functions. The number of functions possible is either N_g-1 where N_g = number of groups, or p (the number of predictors), whichever is smaller. The first function created maximizes the differences between groups on that function. The second function maximizes differences on that function, but also must not be correlated with the previous function. This continues with subsequent functions with the requirement that the new function not be correlated with any of the previous functions ^[2].

Given group $j$ , with $\mathbb {R}$ _j sets of sample space, there is a discriminant rule such that if $x$ ∈ $\mathbb {R}$ _j , then $x$ ∈ $j$ . Discriminant analysis then, finds “good” regions of $\mathbb {R}$ _j to minimize classification error, therefore leading to a high percent correct classified in the classification table ^[6].

Each function is given a discriminant score to determine how well it predicts group placement.

Structure Correlation Coefficients: The correlation between each predictor and the discriminant score of each function. This is a whole correlation ^[3]^[7].
Standardized Coefficients: Each predictor’s unique contribution to each function, therefore this is a partial correlation. Indicates the relative importance of each predictor in predicting group assignment from each function ^[7]^[3].
Functions at Group Centroids: Mean discriminant scores for each grouping variable are given for each function. The farther apart the means are, the less error there will be in classification ^[7]^[3].

Discrimination rules

Maximum Likelihood: Assigns x to the group that maximizes population (group) density. ^[6]
Bayes Discriminant Rule: Assigns x to the group that maximizes π_i $f$ _i $(x)$ , where $f$ _i $(x)$ represents the prior probability of that classification, and π_i represents the population density. ^[6]
Fisher’s Linear Discriminant Rule: Maximizes the ratio between SSbetween and SSwithin, and finds a linear combination of the predictors to predict group ^[6].

Eigenvalues

An eigenvalue in discriminant analysis is the characteristic root of each function. It is an indication of how well that function differentiates the groups, where the larger the eigenvalue, the better the function differentiates ^[3]. This however, should be interpreted with caution, as eigenvalues have no upper limit ^[3]^[2]. The eigenvalue can be viewed as a ratio of SSbetween and SSwithin as in ANOVA when the dependent variable is the discriminant function, and the groups are the levels of the IV ^[2]. This means that the largest eigenvalue is associated with the first function, the second largest with the second, etc.

Effect Size

Some suggest the use of eigenvalues as effect size measures, however, this is generally not supported^[2]. Instead, the canonical correlation is the preferred measure of effect size. It is similar to the eigenvalue, but is the square root of the ratio of SSbetween to SStotal. It is the correlation between groups and the function^[2]. Another popular measure of effect size is the percent of variance for each function. This is calculated by: (λ_x/Σλ_i) X 100 where λ_x is the eigenvalue for the function and Σλ_i is the sum of all eigenvalues. This tells us how strong the prediction is for that particular function compared to the others ^[2]. Percent correctly classified can also be analyzed as an effect size. The kappa value can describe this while correcting for chance agreement^[2].

Variations

Multiple Discriminant Analysis (MDA): related to MANOVA. Has more than two groups, and uses multiple dummy variables ^[7].
Sequential Discriminant Analysis: assesses the importance of a set of IVs over and above a set of controls. In this case, the controls are entered first, and then the IVs ^[7].
Stepwise Discriminant Analysis: Selects the most correlated predictor first, removes that variance in the grouping variable then adds the next most correlated and continues until the change in canonical correlation is not significant. Of course, both forward and backward stepwise procedures may be performed ^[7].

Comparison to Logistic Regression

Discriminant function analysis is very similar to logistic regression, and both can be used to answer the same research questions ^[2]. Logistic regression does not have as many assumptions and restrictions as discriminant analysis, however, when discriminant analysis’ assumptions are met, it is more powerful than logistic regression. Unlike logistic regression, discriminant analysis can be used with small sample sizes. It has been shown that when sample sizes are equal, and homogeneity of variance/covariance holds, discriminant analysis is more accurate ^[3]. With all this being considered, logistic regression is the common choice nowadays, since the assumptions of discriminant analysis are rarely met ^[3]^[1].

References

^ ^a ^b Cohen et al. Applied Multiple Regression/Correlation Analysis for the Behavioural Sciences 3rd ed. (2003). Taylor & Francis Group.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m Green, S.B. Salkind, N. J. & Akey, T. M. (2008). Using SPSS for Windows and Macintosh: Analyzing and understanding data. New Jersey: Prentice Hall.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m BÖKEOĞLU ÇOKLUK, Ö, & BÜYÜKÖZTÜRK, Ş. (2008). Discriminant function analysis: Concept and application. Eğitim araştırmaları dergisi, (33), 73-92.
^ Lachenbruch, P. A. (1975). Discriminant analysis. NY: Hafner
^ Klecka, William R. (1980). Discriminant analysis. Quantitative Applications in the Social Sciences Series, No. 19. Thousand Oaks, CA: Sage Publications.
^ ^a ^b ^c ^d Hardle, W., Simar, L. (2007). Applied Multivariate Statistical Analysis. Springer Berlin Heidelberg. pp. 289-303.
^ ^a ^b ^c ^d ^e ^f Garson, G. D. (2008). Discriminant function analysis. http://www2.chass.ncsu.edu/garson/pa765/discrim.htm.

External links

This statistics-related article is a stub. You can help Wikipedia by expanding it.

[cohen-1] Cohen et al. Applied Multiple Regression/Correlation Analysis for the Behavioural Sciences 3rd ed. (2003). Taylor & Francis Group.

[green-2] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m Green, S.B. Salkind, N. J. & Akey, T. M. (2008). Using SPSS for Windows and Macintosh: Analyzing and understanding data. New Jersey: Prentice Hall.

[buy-3] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m BÖKEOĞLU ÇOKLUK, Ö, & BÜYÜKÖZTÜRK, Ş. (2008). Discriminant function analysis: Concept and application. Eğitim araştırmaları dergisi, (33), 73-92.

[4] Lachenbruch, P. A. (1975). Discriminant analysis. NY: Hafner

[5] Klecka, William R. (1980). Discriminant analysis. Quantitative Applications in the Social Sciences Series, No. 19. Thousand Oaks, CA: Sage Publications.

[har-6] Hardle, W., Simar, L. (2007). Applied Multivariate Statistical Analysis. Springer Berlin Heidelberg. pp. 289-303.

[garson-7] ^ ^a ^b ^c ^d ^e ^f Garson, G. D. (2008). Discriminant function analysis. http://www2.chass.ncsu.edu/garson/pa765/discrim.htm.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

@@ Line 1: / Line 1: @@
-'''Discriminant function analysis''' is a statistical analysis to predict a [[categorical variable|categorical]] [[dependent variable|dependent]] [[Variable (mathematics)#Applied statistics|variable]] by one or more [[continuous variable|continuous]] or [[binary]] [[independent variable|independent]] variables. It is different from an [[ANOVA]] or [[MANOVA]], which is used to predict one (ANOVA) or multiple (MANOVA) continuous dependent variables by one or more independent categorical variables instead. Discriminant function analysis is useful in determining whether a set of variables is effective in predicting category membership.
+'''Discriminant function analysis''' is a statistical analysis to predict a [[categorical variable|categorical]] [[dependent variable|dependent]] [[Variable (mathematics)#Applied statistics|variable]] (called a grouping variable) by one or more [[continuous variable|continuous]] or [[binary]] [[independent variable|independent]] variables (called predictor variables). The original dichotomous discriminant analysis was developed by Sir Ronald Fisher in 1936 <ref name="cohen">Cohen et al. Applied Multiple Regression/Correlation Analysis for the Behavioural Sciences 3rd ed. (2003). Taylor & Francis Group.</ref> It is different from an [[ANOVA]] or [[MANOVA]], which is used to predict one (ANOVA) or multiple (MANOVA) continuous dependent variables by one or more independent categorical variables. Discriminant function analysis is useful in determining whether a set of variables is effective in predicting category membership<ref name="green">Green, S.B. Salkind, N. J. & Akey, T. M. (2008). Using SPSS for Windows and Macintosh: Analyzing and understanding data. New Jersey: Prentice Hall.</ref>
+Discriminant analysis is used when groups are known a priori (unlike in [[cluster analysis]]). Each case must have a score on one or more quantitative predictor measures, and a score on a group measure <ref name="buy">BÖKEOĞLU ÇOKLUK, Ö, & BÜYÜKÖZTÜRK, Ş. (2008). Discriminant function analysis: Concept and application. Eğitim araştırmaları dergisi, (33), 73-92.</ref>. In simple terms, discriminant function analysis is classification - the act of distributing things into groups, classes or categories of the same type.
-Moreover, it is a useful follow-up procedure to a MANOVA instead of doing a series of one-way ANOVAs, for ascertaining how the groups differ on the composite of dependent variables.
+Moreover, it is a useful follow-up procedure to a MANOVA instead of doing a series of one-way ANOVAs, for ascertaining how the groups differ on the composite of dependent variables. In this case, a significant F test allows classification based on a linear combination of predictor variables. Terminology can get confusing here, as in MANOVA, the dependent variables are the predictor variables, and the independent variables are the grouping variables <ref name="green"/>.
-In simple terms, discriminant function analysis is classification - the act of distributing things into classes or categories of the same type.
+==Assumptions==
+The assumptions of discriminant analysis are the same as those for MANOVA. Note, the analysis is quite sensitive to outliers and the n in the smallest group must be larger than the number of predictor variables <ref name="buy"/>.
+*[[Normality|Multivariate Normality]]: Independent variables are normal for each level of the grouping variable <ref name="buy"/><ref name="green"/>.
+*Homogeneity of Variance/Covariance ([[Homoscedasticity]]): Variances among group variables are the same across levels of predictors. Can be tested with Boxes M statistic <ref name="green"/>. It has been suggested, however, that [[linear discriminant analysis]] be used when covariances are equal, and that [[quadratic classifier#quadratic discriminant analysis|quadratic discriminant analysis]] may be used when covariances are not equal <ref name="buy"/>.
+*[[Multicollinearity]]: Predictive power can decrease with an increased correlation between predictor variables, since variance is being accounted for twice <ref name="buy"/>.
+*[[statistical independence|Independence]]: Participants are assumed to be randomly sampled, and a participant’s score on one variable is assumed to be independent of scores on that variable for all other participants <ref name="buy"/><ref name="green"/>.
+	It has been suggested that discriminant analysis is relatively robust to slight violations of these assumptions <ref>Lachenbruch, P. A. (1975). Discriminant analysis. NY: Hafner</ref>, and it has also been shown that discriminant analysis may still be reliable when using dichotomous variables (where multivariate normality is often violated) <ref>Klecka, William R. (1980). Discriminant analysis. Quantitative Applications in the Social Sciences Series, No. 19. Thousand Oaks, CA: Sage Publications.</ref>.
+==Discriminant Functions==
+Discriminant analysis works by creating one or more linear combinations of predictors, creating a new [[latent variable]] for each function. These functions are called discriminant functions. The number of functions possible is either N<sub>g</sub>-1 where N<sub>g</sub> = number of groups, or p (the number of predictors), whichever is smaller. The first function created maximizes the differences between groups on that function. The second function maximizes differences on that function, but also must not be correlated with the previous function. This continues with subsequent functions with the requirement that the new function not be correlated with any of the previous functions <ref name="green"/>.
+Given group <math>j</math>, with <math> \mathbb{R}</math><sub>j</sub>  sets of sample space, there is a discriminant rule such that if <math>x</math><big>∈</big><math>\mathbb{R}</math><sub>j</sub> , then <math>x</math><big>∈</big> <math>j</math>. Discriminant analysis then, finds “good” regions of <math> \mathbb{R}</math><sub>j</sub> to minimize classification error, therefore leading to a high percent correct classified in the classification table <ref name="har">Hardle, W., Simar, L. (2007). Applied Multivariate Statistical Analysis. Springer Berlin Heidelberg. pp. 289-303.</ref>.
+Each function is given a discriminant score to determine how well it predicts group placement.
+*Structure Correlation Coefficients: The correlation between each predictor and the discriminant score of each function. This is a whole correlation <ref name="buy"/><ref name="garson">Garson, G. D. (2008). Discriminant function analysis. http://www2.chass.ncsu.edu/garson/pa765/discrim.htm.</ref>.
+*Standardized Coefficients: Each predictor’s unique contribution to each function, therefore this is a [[partial correlation]]. Indicates the relative importance of each predictor in predicting group assignment from each function <ref name="garson"/><ref name="buy"/>.
+*Functions at Group Centroids: Mean discriminant scores for each grouping variable are given for each function. The farther apart the means are, the less error there will be in classification <ref name="garson"/><ref name="buy"/>.
+==Discrimination rules==
+*[[Maximum Likelihood]]: Assigns x to the group that maximizes population (group) density. <ref name="har"/>
+*Bayes Discriminant Rule: Assigns x to the group that maximizes π<sub>i</sub><math>f</math><sub>i</sub><math>(x)</math>, where <math>f</math><sub>i</sub><math>(x)</math> represents the prior probability of that classification, and π<sub>i</sub> represents the population density. <ref name="har"/>
+*[[Linear Discriminant Analysis|Fisher’s Linear Discriminant Rule]]: Maximizes the ratio between SSbetween and SSwithin, and finds a linear combination of the predictors to predict group <ref name="har"/>.
+==Eigenvalues==
+	An [[eigenvalues and eigenvectors|eigenvalue]] in discriminant analysis is the characteristic root of each function. It is an indication of how well that function differentiates the groups, where the larger the eigenvalue, the better the function differentiates <ref name="buy"/>. This however, should be interpreted with caution, as eigenvalues have no upper limit <ref name="buy"/><ref name="green"/>.
+	The eigenvalue can be viewed as a ratio of SSbetween and SSwithin as in ANOVA when the dependent variable is the discriminant function, and the groups are the levels of the IV <ref name="green"/>. This means that the largest eigenvalue is associated with the first function, the second largest with the second, etc.
+==Effect Size==
+	Some suggest the use of eigenvalues as effect size measures, however, this is generally not supported<ref name="green"/>. Instead, the [[canonical correlation]] is the preferred measure of effect size. It is similar to the eigenvalue, but is the square root of the ratio of SSbetween to SStotal. It is the correlation between groups and the function<ref name="green"/>.
+	Another popular measure of effect size is the percent of variance for each function.  This is calculated by: (λ<sub>x</sub>/Σλ<sub>i</sub>) X 100 where λ<sub>x</sub> is the eigenvalue for the function and Σλ<sub>i</sub> is the sum of all eigenvalues. This tells us how strong the prediction is for that particular function compared to the others <ref name="green"/>.
+	Percent correctly classified can also be analyzed as an effect size. The kappa value can describe this while correcting for chance agreement<ref name="green"/>.
+==Variations==
+*[[Linear Discriminant Analysis#Multiclass LDA|Multiple Discriminant Analysis (MDA)]]: related to MANOVA. Has more than two groups, and uses multiple dummy variables <ref name="garson"/>.
+*Sequential Discriminant Analysis: assesses the importance of a set of IVs over and above a set of controls. In this case, the controls are entered first, and then the IVs <ref name="garson"/>.
+*Stepwise Discriminant Analysis: Selects the most correlated predictor first, removes that variance in the grouping variable then adds the next most correlated and continues until the change in canonical correlation is not significant. Of course, both forward and backward stepwise procedures may be performed <ref name="garson"/>.
+==Comparison to Logistic Regression==
+	Discriminant function analysis is very similar to [[logistic regression]], and both can be used to answer the same research questions <ref name="green"/>. Logistic regression does not have as many assumptions and restrictions as discriminant analysis, however, when discriminant analysis’ assumptions are met, it is more powerful than logistic regression. Unlike logistic regression, discriminant analysis can be used with small sample sizes. It has been shown that when sample sizes are equal, and homogeneity of variance/covariance holds, discriminant analysis is more accurate <ref name="buy"/>. With all this being considered, logistic regression is the common choice nowadays, since the assumptions of discriminant analysis are rarely met <ref name="buy"/><ref name="cohen"/>.
 ==See also==
@@ Line 10: / Line 62: @@
 *[[Linear discriminant analysis]]
 *[[Multiple discriminant analysis]]
+==References==
+{{Reflist}}
 ==External links==