Scheffé's method: Difference between revisions
Bluelink 1 book for verifiability (prndis)) #IABot (v2.0.1) (GreenC bot |
→Denoting Scheffé significance in a table: Changed text to say "subscript" instead of "superscript" |
||
(5 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
In [[statistics]], '''Scheffé's method''', named after |
In [[statistics]], '''Scheffé's method''', named after [[United States|American]] [[statistician]] [[Henry Scheffé]], is a method for adjusting [[statistical significance|significance levels]] in a [[linear regression]] analysis to account for [[multiple comparisons]]. It is particularly useful in [[analysis of variance]] (a special case of regression analysis), and in constructing simultaneous [[confidence band]]s for regressions involving [[basis functions]]. |
||
Scheffé's method is a single-step multiple comparison procedure which applies to the set of estimates of all possible [[contrast (statistics)|contrast]]s among the factor level means, not just the pairwise differences considered by the [[Tukey–Kramer method]]. It works on similar principles as the [[Working–Hotelling procedure]] for estimating mean responses in regression, which applies to the set of all possible factor levels. |
Scheffé's method is a single-step multiple comparison procedure which applies to the set of estimates of all possible [[contrast (statistics)|contrast]]s among the factor level means, not just the pairwise differences considered by the [[Tukey–Kramer method]]. It works on similar principles as the [[Working–Hotelling procedure]] for estimating mean responses in regression, which applies to the set of all possible factor levels. |
||
Line 5: | Line 5: | ||
==The method== |
==The method== |
||
Let |
Let <math display="inline">\mu_1, \ldots , \mu_r </math> be the [[mean]]s of some variable in <math display="inline">r </math> disjoint populations. |
||
An arbitrary contrast is defined by |
An arbitrary contrast is defined by |
||
Line 15: | Line 15: | ||
:<math>\sum_{i=1}^r c_i = 0.</math> |
:<math>\sum_{i=1}^r c_i = 0.</math> |
||
If |
If <math display="inline">\mu_1, \ldots , \mu_r </math> are all equal to each other, then all contrasts among them are {{Math|0}}. Otherwise, some contrasts differ from {{Math|0}}. |
||
Technically there are infinitely many contrasts. The simultaneous confidence coefficient is exactly 1 |
Technically there are infinitely many contrasts. The simultaneous confidence coefficient is exactly <math display="inline">1- \alpha </math>, whether the factor level sample sizes are equal or unequal. (Usually only a finite number of comparisons are of interest. In this case, Scheffé's method is typically quite conservative, and the [[family-wise error rate]] (experimental error rate) will generally be much smaller than <math display="inline">\alpha </math>.)<ref name="MaxwellDelaney">{{cite book |first=Scott E. |last=Maxwell |first2=Harold D. |last2=Delaney |title=Designing Experiments and Analyzing Data: A Model Comparison |publisher=Lawrence Erlbaum Associates |year=2004 |isbn=0-8058-3718-3 |pages=217–218 }}</ref><ref name="MillikenJohnson">{{cite book |first=George A. |last=Milliken |first2=Dallas E. |last2=Johnson |title=Analysis of Messy Data |publisher=CRC Press |year=1993 |isbn=0-412-99081-4 |pages=35–36 }}</ref> |
||
We estimate |
We estimate <math display="inline">C </math> by |
||
:<math>\hat{C} = \sum_{i=1}^r c_i\bar{Y}_i</math> |
:<math>\hat{C} = \sum_{i=1}^r c_i\bar{Y}_i</math> |
||
Line 27: | Line 27: | ||
where |
where |
||
* |
* <math display="inline">n_i </math> is the size of the sample taken from the <math display="inline">i </math><sup>th</sup> population (the one whose mean is <math display="inline">\mu_ i </math>), and |
||
* <math>\hat{\sigma}_e^2</math> is the estimated variance of the [[errors and residuals in statistics|errors]]. |
* <math>\hat{\sigma}_e^2</math> is the estimated variance of the [[errors and residuals in statistics|errors]]. |
||
It can be shown that the probability is 1 |
It can be shown that the probability is <math display="inline">1- \alpha </math> that all confidence limits of the type |
||
:<math>\hat{C}\pm\,s_\hat{C}\sqrt{\left(r-1\right)F_{\alpha;r-1;N-r}} </math> |
:<math>\hat{C}\pm\,s_\hat{C}\sqrt{\left(r-1\right)F_{\alpha;r-1;N-r}} </math> |
||
are simultaneously correct, where as usual N is the size of the whole population. Draper and Smith, in their 'Applied Regression Analysis' (see references), indicate that |
are simultaneously correct, where as usual <math display="inline">N </math> is the size of the whole population. Norman R. Draper and Harry Smith, in their 'Applied Regression Analysis' (see references), indicate that <math display="inline">r </math> should be in the equation in place of <math display="inline">r-1 </math>. The slip with <math display="inline">r-1 </math> is a result of failing to allow for the additional effect of the constant term in many regressions. That the result based on <math display="inline">r-1 </math> is wrong is readily seen by considering <math display="inline">r=2 </math>, as in a standard simple linear regression. That formula would then reduce to one with the usual <math display="inline">t</math>-distribution, which is appropriate for predicting/estimating for a single value of the independent variable, not for constructing a confidence band for a range of values of the independent value. Also note that the formula is for dealing with the mean values for a range of independent values, not for comparing with individual values such as individual observed data values.<ref>{{Cite book|title=Applied Regression Analysis|year=1998|url=https://archive.org/details/appliedregressio00drap_633|url-access=limited|last=Draper|first=Norman R|last2=Smith|first2=Harry|publisher=John Wiley and Sons, Inc.|isbn=9780471170822|edition=2nd|page=[https://archive.org/details/appliedregressio00drap_633/page/n101 93]}}</ref> |
||
==Denoting Scheffé significance in a table== |
==Denoting Scheffé significance in a table== |
||
Frequently, |
Frequently, subscript letters are used to indicate which values are significantly different using the Scheffé method. For example, when mean values of variables that have been analyzed using an [[ANOVA]] are presented in a table, they are assigned a different letter subscript based on a Scheffé contrast. Values that are not significantly different based on the post-hoc Scheffé contrast will have the same subscript and values that are significantly different will have different subscripts (i.e. 15<sub>a</sub>, 17<sub>a</sub>, 34<sub>b</sub> would mean that the first and second variables both differ from the third variable but not each other because they are both assigned the subscript "a").{{citation needed|date=August 2012}} |
||
==Comparison with the Tukey–Kramer method== |
==Comparison with the Tukey–Kramer method== |
||
Line 47: | Line 47: | ||
* {{cite journal |last=Bohrer |first=Robert |year=1967 |title=On Sharpening Scheffé Bounds |journal=[[Journal of the Royal Statistical Society]] |series=Series B |volume=29 |issue=1 |pages=110–114 |jstor=2984571 }} |
* {{cite journal |last=Bohrer |first=Robert |year=1967 |title=On Sharpening Scheffé Bounds |journal=[[Journal of the Royal Statistical Society]] |series=Series B |volume=29 |issue=1 |pages=110–114 |jstor=2984571 }} |
||
* {{cite book |last=Scheffé |first=H. | |
* {{cite book |last=Scheffé |first=H. |orig-year=1959 |title=The Analysis of Variance |publisher=Wiley |location=New York |year=1999 |isbn=0-471-34505-9 }} |
||
==External links== |
==External links== |
Latest revision as of 23:11, 21 February 2024
In statistics, Scheffé's method, named after American statistician Henry Scheffé, is a method for adjusting significance levels in a linear regression analysis to account for multiple comparisons. It is particularly useful in analysis of variance (a special case of regression analysis), and in constructing simultaneous confidence bands for regressions involving basis functions.
Scheffé's method is a single-step multiple comparison procedure which applies to the set of estimates of all possible contrasts among the factor level means, not just the pairwise differences considered by the Tukey–Kramer method. It works on similar principles as the Working–Hotelling procedure for estimating mean responses in regression, which applies to the set of all possible factor levels.
The method
[edit]Let be the means of some variable in disjoint populations.
An arbitrary contrast is defined by
where
If are all equal to each other, then all contrasts among them are 0. Otherwise, some contrasts differ from 0.
Technically there are infinitely many contrasts. The simultaneous confidence coefficient is exactly , whether the factor level sample sizes are equal or unequal. (Usually only a finite number of comparisons are of interest. In this case, Scheffé's method is typically quite conservative, and the family-wise error rate (experimental error rate) will generally be much smaller than .)[1][2]
We estimate by
for which the estimated variance is
where
- is the size of the sample taken from the th population (the one whose mean is ), and
- is the estimated variance of the errors.
It can be shown that the probability is that all confidence limits of the type
are simultaneously correct, where as usual is the size of the whole population. Norman R. Draper and Harry Smith, in their 'Applied Regression Analysis' (see references), indicate that should be in the equation in place of . The slip with is a result of failing to allow for the additional effect of the constant term in many regressions. That the result based on is wrong is readily seen by considering , as in a standard simple linear regression. That formula would then reduce to one with the usual -distribution, which is appropriate for predicting/estimating for a single value of the independent variable, not for constructing a confidence band for a range of values of the independent value. Also note that the formula is for dealing with the mean values for a range of independent values, not for comparing with individual values such as individual observed data values.[3]
Denoting Scheffé significance in a table
[edit]Frequently, subscript letters are used to indicate which values are significantly different using the Scheffé method. For example, when mean values of variables that have been analyzed using an ANOVA are presented in a table, they are assigned a different letter subscript based on a Scheffé contrast. Values that are not significantly different based on the post-hoc Scheffé contrast will have the same subscript and values that are significantly different will have different subscripts (i.e. 15a, 17a, 34b would mean that the first and second variables both differ from the third variable but not each other because they are both assigned the subscript "a").[citation needed]
Comparison with the Tukey–Kramer method
[edit]If only a fixed number of pairwise comparisons are to be made, the Tukey–Kramer method will result in a more precise confidence interval. In the general case when many or all contrasts might be of interest, the Scheffé method is more appropriate and will give narrower confidence intervals in the case of a large number of comparisons.
References
[edit]- ^ Maxwell, Scott E.; Delaney, Harold D. (2004). Designing Experiments and Analyzing Data: A Model Comparison. Lawrence Erlbaum Associates. pp. 217–218. ISBN 0-8058-3718-3.
- ^ Milliken, George A.; Johnson, Dallas E. (1993). Analysis of Messy Data. CRC Press. pp. 35–36. ISBN 0-412-99081-4.
- ^ Draper, Norman R; Smith, Harry (1998). Applied Regression Analysis (2nd ed.). John Wiley and Sons, Inc. p. 93. ISBN 9780471170822.
- Bohrer, Robert (1967). "On Sharpening Scheffé Bounds". Journal of the Royal Statistical Society. Series B. 29 (1): 110–114. JSTOR 2984571.
- Scheffé, H. (1999) [1959]. The Analysis of Variance. New York: Wiley. ISBN 0-471-34505-9.
External links
[edit]This article incorporates public domain material from the National Institute of Standards and Technology