Welch's t-test

In statistics, Welch's t-test (or unequal variances t-test) is a two-sample location test, and is used to test the hypothesis that two populations have equal means. Welch's t-test is an adaptation of Student's t-test,^[1] and is more reliable when the two samples have unequal variances and unequal sample sizes.^[2] These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping. Given that Welch's t-test has been less popular than Student's t-test^[2] and may be less familiar to readers, a more informative name is "Welch's unequal variances t-test" or "unequal variances t-test" for brevity.

Assumptions

Student's t-test assumes that the two populations have normal distributions and with equal variances. Welch's t-test is designed for unequal variances, but the assumption of normality is maintained.^[1] Welch's t-test is an approximate solution to the Behrens-Fisher problem.

Calculations

Welch's t-test defines the statistic t by the following formula:

t\quad =\quad {\;{\overline {X}}_{1}-{\overline {X}}_{2}\; \over {\sqrt {\;{s_{1}^{2} \over N_{1}}\;+\;{s_{2}^{2} \over N_{2}}\quad }}}\,

where ${\overline {X}}_{1}$ , $s_{1}^{2}$ and $N_{1}$ are the 1^st sample mean, sample variance and sample size, respectively. Unlike in Student's t-test, the denominator is not based on a pooled variance estimate.

The degrees of freedom $\nu$ associated with this variance estimate is approximated using the Welch–Satterthwaite equation:

\nu \quad \approx \quad {{\left(\;{s_{1}^{2} \over N_{1}}\;+\;{s_{2}^{2} \over N_{2}}\;\right)^{2}} \over {\quad {s_{1}^{4} \over N_{1}^{2}\nu _{1}}\;+\;{s_{2}^{4} \over N_{2}^{2}\nu _{2}}\quad }}

Here $\nu _{1}$ = $N_{1}-1$ , the degrees of freedom associated with the 1^st variance estimate. $\nu _{2}$ = $N_{2}-1$ , the degrees of freedom associated with the 2^nd variance estimate.

Welch's t-test can also be calculated for ranked data and might then be named Welch's U-test.^[3]

Statistical test

Once t and $\nu$ have been computed, these statistics can be used with the t-distribution to test the null hypothesis that the two population means are equal (using a two-tailed test), or the alternative hypothesis that one of the population means is greater than or equal to the other (using a one-tailed test). The approximate degrees of freedom is rounded down to the nearest integer.

Advantages and limitations

Welch's t-test is more robust than Student's t-test and maintains type I error rates close to nominal for unequal variances and for unequal sample sizes. Furthermore, the power of Welch's t-test comes close to that of Student’s t-test, even when the population variances are equal and sample sizes are balanced.^[2]

It is not recommended to pre-test for equal variances and then choose between Student's t-test or Welch's t-test.^[4] Rather, Welch's t-test can be applied directly and without any substantial disadvantages to Student's t-test as noted above. Welch's t-test remains robust for skewed distributions and large sample sizes.^[5] Reliability decreases for skewed distributions and smaller samples, where one could possibly perform Welch’s t-test on ranked data.^[3]

Examples

The following three examples compare Welch's t-test and Student's t-test. Samples are from random normal distributions using the R programming language.

For all three examples, the population means were $\mu _{1}$ = 20 and $\mu _{2}$ = 22.

The first example is for equal variances ( $\sigma _{1}^{2}$ = $\sigma _{2}^{2}$ = 4) and equal sample sizes ( $N_{1}$ = $N_{2}$ = 15). Let A1 and A2 denote two random samples:

$A1={27.5,21.0,19.0,23.6,17.0,17.9,16.9,20.1,21.9,22.6,23.1,19.6,19.0,21.7,21.4}$

$A2={27.1,22.0,20.8,23.4,23.4,23.5,25.8,22.0,24.8,20.2,21.9,22.1,22.9,20.5,24.4}$

The second example is for unequal variances ( $\sigma _{1}^{2}$ = 16, $\sigma _{2}^{2}$ = 1) and unequal sample sizes ( $N_{1}$ = 10, $N_{2}$ = 20). The smaller sample has the larger variance:

$A1={17.2,20.9,22.6,18.1,21.7,21.4,23.5,24.2,14.7,21.8}$

$A2={21.5,22.8,21.0,23.0,21.6,23.6,22.5,20.7,23.4,21.8,20.7,21.7,21.5,22.5,23.6,21.5,22.5,23.5,21.5,21.8}$

The third example is for unequal variances ( $\sigma _{1}^{2}$ = 1, $\sigma _{2}^{2}$ = 16) and unequal sample sizes ( $N_{1}$ = 10, $N_{2}$ = 20). The larger sample has the larger variance:

$A1={19.8,20.4,19.6,17.8,18.5,18.9,18.3,18.9,19.5,22.0}$

$A2={28.2,26.6,20.1,23.3,25.2,22.1,17.7,27.6,20.6,13.7,23.2,17.5,20.6,18.0,23.9,21.6,24.3,20.4,24.0,13.2}$

Reference P-values were obtained by simulating the distributions of the t statistics for the null hypothesis of equal population means ( $\mu _{1}-\mu _{2}$ = 0). Results are summarised in the table below, with two-tailed P-values:

	Sample A1			Sample A2			Student's t-test				Welch's t-test
Example	$N_{1}$	${\overline {X}}_{1}$	$s_{1}^{2}$	$N_{2}$	${\overline {X}}_{2}$	$s_{2}^{2}$	$t$	$\nu$	$P$	$P_{sim}$	$t$	$\nu$	$P$	$P_{sim}$
1	15	20.8	7.9	15	23.0	3.8	-2.46	28	0.021	0.021	-2.46	25.0	0.021	0.017
2	10	20.6	9.0	20	22.1	0.9	-2.10	28	0.045	0.150	-1.57	9.9	0.149	0.144
3	10	19.4	1.4	20	21.6	17.1	-1.64	28	0.110	0.036	-2.22	24.5	0.036	0.042

Welch's t-test and Student's t-test gave practically identical results for the two samples with equal variances and equal sample sizes (Example 1). For unequal variances, Student's t-test gave a low P-value when the smaller sample had a larger variance (Example 2) and a high P-value when the larger sample had a larger variance (Example 3). For unequal variances, Welch's t-test gave P-values close to simulated P-values.

Software implementations

Language/Program	Function	Notes
LibreOffice	`TTEST(Data1; Data2; Mode; Type)`	See [1]
MATLAB	`ttest2(data1, data2, 'Vartype', 'unequal')`	See [2]
Microsoft Excel pre 2010	`TTEST(array1, array2, tails, type)`	See [3]
Microsoft Excel 2010 and later	`T.TEST(array1, array2, tails, type)`	See [4]
Python	`scipy.stats.ttest_ind(a, b, axis=0, equal_var=False)`	See [5]
R	`t.test(data1, data2, alternative="two.sided", var.equal=FALSE)`	See [6]
Julia	`UnequalVarianceTTest(data1, data2)`	See [7]

References

^ ^a ^b Welch, B. L. (1947). "The generalization of "Student's" problem when several different population variances are involved". Biometrika. 34 (1–2): 28–35. doi:10.1093/biomet/34.1-2.28. MR 0019277.
^ ^a ^b ^c Ruxton, G. D. (2006). "The unequal variance t-test is an underused alternative to Student's t-test and the Mann–Whitney U test". Behavioral Ecology. 17: 688–690. doi:10.1093/beheco/ark016.
^ ^a ^b Fagerland, M. W.; Sandvik, L. (2009). "Performance of five two-sample location tests for skewed distributions with unequal variances". Contemporary Clinical Trials. 30: 490–496. doi:10.1016/j.cct.2009.06.007.
^ Zimmerman, D. W. (2004). "A note on preliminary tests of equality of variances". British Journal of Mathematical and Statistical Psychology. 57: 173–181. doi:10.1348/000711004849222.
^ Fagerland, M. W. (2012). "t-tests, non-parametric tests, and large studies—a paradox of statistical practice?". BioMed Central Medical Research Methodology. 12: 78. doi:10.1186/1471-2288-12-78.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[Welch1947-1] Welch, B. L. (1947). "The generalization of "Student's" problem when several different population variances are involved". Biometrika. 34 (1–2): 28–35. doi:10.1093/biomet/34.1-2.28. MR 0019277.

[Ruxton2006-2] Ruxton, G. D. (2006). "The unequal variance t-test is an underused alternative to Student's t-test and the Mann–Whitney U test". Behavioral Ecology. 17: 688–690. doi:10.1093/beheco/ark016.

[Fagerland2009-3] Fagerland, M. W.; Sandvik, L. (2009). "Performance of five two-sample location tests for skewed distributions with unequal variances". Contemporary Clinical Trials. 30: 490–496. doi:10.1016/j.cct.2009.06.007.

[Zimmerman2004-4] Zimmerman, D. W. (2004). "A note on preliminary tests of equality of variances". British Journal of Mathematical and Statistical Psychology. 57: 173–181. doi:10.1348/000711004849222.

[Fagerland2012-5] Fagerland, M. W. (2012). "t-tests, non-parametric tests, and large studies—a paradox of statistical practice?". BioMed Central Medical Research Methodology. 12: 78. doi:10.1186/1471-2288-12-78.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[1]

[2]

[3]

[4]

[5]