Jump to content

Talk:Student's t-test

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by StevenLinde (talk | contribs) at 18:32, 22 May 2022 (Worked_examples values are confusing: new section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Template:Vital article

Assumptions

Maybe I'm missing something, but it seems like the assumptions section is extremely wrong. The underlying distributions do *not* need to be normal. The statistics' (i.e., sample average) distributions need to be normally distributed, and they will be, according to the Central Limit Theorem. 70.35.57.149 (talk) 19:13, 7 March 2017 (UTC)[reply]

My understanding is that you are right, mostly. Only for small samples do we need the sample(s) to follow a normal distribution, when the mean (numerator) and standard error (denominator) won't automatically be normally distributed according to the CLT. And this is the situation where t-tests are most important, because when the samples are large enough for the CLT to apply, they're also large enough for the t-distribution to converge to the Z-distribution. I think this ought to be mentioned (although my authority for this is a statistician friend - I'm still looking for a published statement about it). Then the bit that describes how to test a sample for normality brings a special irony, because a test (like the Shapiro-Wilk or Kolmogorov-Smirnov) for normality is more likely to reject the null hypothesis of normality as the sample size becomes larger, and this is exactly when you don't need to worry so much about normality! RMGunton (talk) 15:45, 13 February 2019 (UTC)[reply]
The sample mean need not be normally distributed either. Sketch of proof: Efron (1969) (Student's t-Test Under Symmetry Conditions) shows in Section 1 that a proof by Fisher (1925) (Applications of "Student's" Distribution) for the normal case actually only uses the 'sphericity / rotational invariance / orthogonal invariance' of the normal distribution of individual observations for the t-test to control size (Type I error). So, orthogonal invariance of the distribution of X := (X_1, X_2, ..., X_n) is sufficient. This absolutely does not imply that the sample mean is normally distributed, so normality of the sample mean is not necessary. For (counter)example, if n = 3 then it follows from Archimedes' Hat-Box Theorem that a random variable distributed uniformly over the unit sphere (which is clearly orthogonal invariant) has a sample mean that follows a uniform distribution. NWK2 (talk) 14:31, 3 June 2021 (UTC)[reply]

Test statistic for one-sample t-test

The section “One-sample t-test” says

In testing the null hypothesis that the population mean is equal to a specified value μ0, one uses the statistic
where is the sample mean, s is the sample standard deviation of the sample and n is the sample size. The degrees of freedom used in this test are n − 1.

Should the formula actually use the square root of n–1 rather than the square root of n? Recently an IP changed it to n–1, but another IP changed it back. Loraof (talk) 19:45, 7 July 2018 (UTC)[reply]

No, it should definitely be n there; the standard error of is , which we estimate by replacing s by . (There is an n-1 'hidden' within the definition of s though.) Unfortunately, while it used to be mostly correct, there are many serious errors on this page now because too many people think they have enough expertise on this subject to edit this page when they really, really don't. (There are so many people who write basic stats books for various application areas who don't know what they're doing, and then their students run down here and wreck up the place. It's like trying to push back the tide with a colander.) Glenbarnett (talk) 03:09, 25 October 2020 (UTC)[reply]

Move discussion in progress

There is a move discussion in progress on Talk:Student's t-distribution which affects this page. Please participate on that page and not in this talk page section. Thank you. —RMCD bot 04:01, 24 August 2021 (UTC)[reply]

Worked_examples values are confusing

hey there, just wanted to point out that the values in the Worked_examples present some speed bumps for folks following along with some tools. in excel/google sheets terms, this is the difference in STDEV() versus STDEVP(). some tools, like numpy.std default to the latter so the values end up differing from examples. i will suggest an edit with values that avoid this that follows for the rest of the example, but wanted to flag this first.

along these lines, it is somewhat confusing that the difference in means just happens to `0.095`, something that is generally a value used for confidence thresholds. i think any suggestion to fix the first point will take care of this too, but a nice to have to avoid confusion for stats newbie's like me who'd be following this page topic.