Talk:Pearson's chi-squared test

Statistics Start‑class High‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
High	This article has been rated as High-importance on the importance scale.

Link to probability theory

I've tried to improve the 2 cells and Many cells sections but I'm not a great expert -- could someone pls take a look and check them? In particular, we need to explain why (O-E)^2/E ends up being the same as the (O-E)^2/sigma that is used by the chi-squared distribution.

Simply chi-squared test?

What's wrong with simply chi-squared test? Are there more than one that are of encyclopedic interest? --mav

You've got to be kidding!!! There are zillions of them (zillions = at least a dozen or so) that are so different from each other except in sharing a common null distribution that <POV> it is astonishing that anyone could wonder about this </POV>. Well, maybe not astonishing to the layman, but still.... Michael Hardy 00:38 May 14, 2003 (UTC)

[I removed my earlier hasty and ill-considered -- and incorrect -- comment because Michael makes the point better.] Jfitzg

In that case, then perhaps there should be an antry at chi-squared test saying that there are lots of them, that the general principle was developed by A and B in century C, that these three tests are the most commonly used although those 7 are sometimes used for purpose X and purpose Y, and that all have in common the idea Z. As a generality, the maths entries on the 'pedia are dense and forbidding to the non-mathematician. This sort of thing helps a lot to make stuff accessible to the general reader - which is what we are here for, isn't it? Tannin 00:47 May 14, 2003 (UTC)

Intro

could we have a brief intro in english please?

The current intro is comprehensible. I didn't write it -- it replaces a simpler one I wrote which probably appeared more English but which was not specific enough.Jfitzg

I can comprehend it, sure. But I spent a couple of years studying stats, and even so I don't find it exactly easy reading. If I had happened to take a different minor, there is no way I could read and understand that intro, nor would I expect anyone else without at least some specialist training to be able to do so. I appreciate that the maths people want to get the maths entries as precise and strictly correct as possible, and applaud that urge, but we need to make sure that the casual reader is able to look at an entry and, even if he is unable to understand it in detail (or unwilling to put in the half-hour or so of concentrated effort it might take to grasp the detail), at least he should be able to walk away with a rough idea of what it is all about.

I suggest changing the first para to something like this:

Pearson's chi-squared test (χ²)—one of a variety of different chi-squared tests—is a statistical procedure used with category data to decide if experimental results are statistically significant, or else can reasonably be explained by mere chance. Like most statistical tests, it compares observed frequencies (under some kind of test condition) with expected frequencies: in general, the greater the difference between the two, the less likely it is that the experimental results are simply the result of luck.

In more detail, Pearson's chi-squared is for testing a null hypothesis that states that relative frequencies of occurrence of several specified mutually exclusive events, at least one of which must occur each time a specified experiment is performed, follow a specified frequency distribution. One of the simplest examples ....... etc.

Tannin 13:03 14 May 2003 (UTC)

THing is, I am a maths person. (I just happen to be allergic to stats). Maybe it's the Chi-squared test article which should give an overview of what they are and why they're useful / interesting -- Tarquin 13:10 May 14, 2003 (UTC)

Hmmm ... OK, but Pearson's chi-squared is the chi-squared in a very real sense. Sure, there are others, but this one is the only one that most people are ever going to use. I think it is a special case. The really obscure ones need less contextualising at the start of the entry. Tannin

After further thought I concluded that an introduction like the one suggested by Tannin would be desirable. I'd suggest starting with a more general definition -- no reference to category data, for example. I don't think it's accurate to say that most statistical tests compare observed and expected frequencies, but perhaps I'm missing something or that wasn't what was intended. Anyway, I had thought of combining the current definition with the less detailed one originally posted. I suppose at some point someone's just going to have to grasp the nettle and change it.Jfitzg

Ahh, I knew it would be better to put it here than straight into the entry - my stats is very rusty, and I'm not surprised to have been caught in an error. I meant that most statistical tests compare observed and expected scores of some kind - not always frequencies, obviously. And I should have said "most statistical testing" (i.e., most frequently performed) as oposed to "most tests" (i.e., largest number of different tests). I mentioned the category data because, for the only-knows-a-little-bit statistician (your average social scientist, let's say), that's the key thing you have to remember: chi-squared for category data, f-test or t-test or ANOVA for everything else. And if you can't use one of those, ask a real statistician. :) Tannin

Slip is a better word than error, I think. That's what I call them when I remove mine from contributions I've made, anyway. I'll log back on later (I think it's about time I made some money) and if no one has taken a stab at modifying the beginning I'll have a go and await comments.Jfitzg

I agree that it's too dense; I wrote it hastily. It has the advantage over the earlier version, of being correct. The earlier version spoke of differences between observed and theoretical frequencies, but that is trivial: The observed frequences are nearly always obviously different from the ones specified by the null hypothesis, and that is not what is of interest. What is of interest is whether the unobservable population frequencies differ from the theoretical ones.

I don't think it's a good idea to say that the purpose of the test is to decide whether the data are statistically significant. Statistical significance is of interest only because it indicates that the null hypothesis is false. Whether the null hypothesis is false is what is of interest; interest in statistical significance is secondary and merely a means to an end. The null and alternative hypotheses should be made clear in any statement of the purpose of this test. I'll probably get to this within a few days. Michael Hardy 01:47 15 May 2003 (UTC)

In the meantime I simply shortened the sentences in what you wrote. As I said above, it is thoroughly comprehensible, and if people found it dense it was probably because of the one long sentence. It's only a suggestion -- I didn't make it here first because I wasn't altering the content all that much. Jfitzg

Correct name

What is the correct name of this test? I see that you have written "chi-squared test", but in many references I have seen it written as "chi-squared test". Which version does Pearson use? And while we are here, should one write p-value or P-value?

Which version did Pearson use? Simple: he just used the symbol, as far as I have seen. There's a link on his biography page to a site that has a PDF facsimile of his paper from 1900. Maybe it would be useful to link there from here? As for correct name: "chi-squared" and "chi-squared" are used about equally as far as search engine results go. Grotendeels Onschadelijk 03:07, 4 August 2007 (UTC)[reply]

The correct version is chi-squared. Any time you raise an entity to the second power, it is that entity squared, not that entity square, just like to the third power is cubed, not cube. — Preceding unsigned comment added by 129.85.4.39 (talk) 21:30, 29 November 2011 (UTC)[reply]

Historical perspective

I'd like to see some historical perspective. ie, when was it invented? Was this the original chi-squared or is it a development of an earlier one? How heavily is it used (my guess is very heavily) and when did it become a standard statistical method in widespread use? 203.164.221.61 02:55, 30 April 2006 (UTC)[reply]

Intro

I am generally a staunch advocate for practical examples in articles, but shouldn't the first equation introduced be for the general case, not a specific example? I was looking at the limits of the sum (n=1-6) and thinking, "Huh?" until I read that it was an example for a six-sided die. Since the formula is almost completely general, I don't think it's a big deal to change it.

I also think this article could use a bit of general reworking, because it didn't answer the question I had at all (which was how a sample's variance is taken into account).

Restructuring/Different Emphasis?

I'd understand this topic better coming at it first from the perspective of Fisher's Exact Test, whose justification as a permutation test is intuitively obvious. I would prefer to then be introduced afterwards to the chi-squared distribution as an approximation which becomes useful when the permutation test becomes intractable. In other words, I (statistically still fuzzy, like many potential readers) get derailed at the sentence

If the null hypothesis is true... the test statistic will be drawn from a chi-squared distribution with one degree of freedom.

because I feel I've got to imediately follow that link before I can understand anything else. If this were postponed till after a discussion of the exact test, I could get the main idea of the chi-squared test from that and only then move on to worry about the approximation that the chi-squared distribution supplies.

I know that the article presents it as it is conventionally taught; but I've been reading this Julian Simon book on resampling [1] and becoming increasingly convinced that the conventional presentation is not the best order in which to learn things---since only now am I (generally non-dumbass) finally starting to get this stuff.

Thoughts?

71.127.0.211 16:22, 2 December 2006 (UTC)[reply]

(A note on the note: I've reinserted the word "dumbass" which had been inappropriately deleted from my comment and replaced with "expletive deleted" by User:Chris53516. Any speaker of English will explain to you that this is nowhere near an "expletive". And it can hardly be offensive when it's being used in self-deprecation. It would be stylistically inappropriate for most articles, but no one has any business deleting it from a discussion page post. Furthermore, as I understand the policy discussion at Wikipedia:Profanity, there's not even a clear policy requiring "expletive deletion" from articles, let alone talk pages). 72.79.228.10 21:31, 29 March 2007 (UTC)[reply]

I rather agree, actually. That seems like a more intuitive approach. --Gak 20:41, 8 February 2007 (UTC)[reply]

Reduced Chi-squared?

I'm posting on this page because it seems to be the most active of the chi square test talk pages and I'm not sure where it belongs. I added a reduced chi squared statistic to the goodness of fit page, but I don't know if it should have its own page or not. Is it a subset of the Pearson's chi-squared, or independent? I'm afraid I don't even have a statistics textbook handy. --Keflavich 17:17, 9 May 2007 (UTC)[reply]

Interpretation of results

This page should include some more detailed information about interpreting the numeric result of the test. In the intro the article mentions:

"A chi-squared probability of 0.05 or less is commonly interpreted by applied workers as justification for rejecting the null hypothesis that the row variable is unrelated (that is, only randomly related) to the column variable."

In the two cells section, the idea that the number of degrees of freedom comes into play when interpreting the result, but this is very vague IMHO. Statistics is a weak point in my mathematics knowledge and i am re-learning a lot of it (i studied this about 15 years ago) so maybe my ignorance makes me think this is more vague than it is...not sure.

Straha 206th 22:56, 17 May 2007 (UTC)[reply]

It is vague 216.99.15.253

I agree, this is precisely when I visited the entry. --Belg4mit 14:52, 23 October 2007 (UTC)[reply]

Also, I don't understand what applied workers means (?). row variable and column variable seem to indicate some specific kind of test. Ggenellina (talk) 03:08, 30 January 2009 (UTC)[reply]

Introduction: Incomprehensible sentence

For a while the introduction of this artice has contained the sentence "The events are assumed to be independent and have the same distribution, and the outcomes of each event must be mutually exclusive." I've read this sentence about 100 times, and each time my confidence level has dropped about 1%. At least there is a confusion of terms, because

by the standard terminology events do not have a distribution, although they have a probability, but it can not be intended that they must have the same probability;
outcomes are necessarily mutually exclusive, although events may be mutually exclusive if they don't contain the same outcomes.

More seriously, it makes very little sense for events to be independent and mutually exclusive at the same time. For events A and B to be independent, Pr(A and B) = Pr(A)Pr(b) would have to hold. For them to be mutually exclusive, we would have to have Pr(A and B) = 0. So either Pr(A) or Pr(B) would have to be zero.

As far as I can see, this is a big mistake, and really, I can see no reason why this sentence couldn't be replaced by "The events must be mutually exclusive". (as it used to be) Grotendeels Onschadelijk 03:40, 4 August 2007 (UTC)[reply]

On further reflection, it occured to me that what is actually meant, if not said, by that sentence, is that it is assumed that the data under consideration is in fact a sample. I think that can be more easily achieved by saying that, and linking to the appropriate page.

I propose to change

It tests a null hypothesis that the relative frequencies of occurrence of observed events follow a specified frequency distribution. The events are assumed to be independent and have the same distribution, and the outcomes of each event must be mutually exclusive.

to

It tests a null hypothesis that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution. The events considered must be mutually exclusive and have total probability 1. A common case for this is where the events each cover an outcome of a categorical variable.

Note that I have also changed the use of frequency distribution to match the referred page. I am slightly uncertain about the "total probability 1" requirement, but that seems to be assumed everywhere I looked. Grotendeels Onschadelijk 09:34, 9 August 2007 (UTC)[reply]

ANY HOPE OF ADDING THE CHI-SQUARED TEST FOR HOMOGENEITY? In my work (a social scientist who uses some statistics) I've only ever encountered chi-squared test for goodness-of-fit, chi-squared test for independence and chi-squared test for homogeneity. Having this last would certainly round things out for my students, who I suspect visit this page. —Preceding unsigned comment added by 139.57.144.30 (talk) 22:49, 23 July 2008 (UTC)[reply]

Normal distribution of deviations ???

Hi there, I found on the web this assumption for the Chi Square test, is it true? any one got info on the matter:

Normal distribution of deviations (observed minus expected values) is assumed. Note chi-squared is a nonparametric test in the sense that is does not assume the parameter of normal distribution for the data -- only for the deviations.

Talgalili (talk) 21:46, 29 July 2009 (UTC)[reply]

Assumptions on cell size: This article states that cells should not be less than 5. There is one textbook that gives a similar rule, but the standard teaching refers only to expected cell numbers, not observed. Typically this is phrased as: no expected frequency should be less than 1 and no more than 20% of the expected frequencies should be less than 5.

December 2009

Is the formula of the test correct? I see everywhere( even in the definition of wikipedia for the chi-squared distribution) that the denominator E_i should also be squared. Am I wrong? --- 130.237.166.86 (talk) at 16:15, 19 December 2009.

Yes you are wrong, and both articles are presently correct. Note that the contexts of the two articles are different, but do essentially correspond. In chi-squared distribution, the observations have a normal distibution and the test statistic is composed of the sum squared deviations from the mean, divided by the (known) variance. Here the observations are counts of occurences within categories and the test statistic is composed of the sum squared deviations from the mean, divided by the an approximation for the variance, where the approximation is that the (true) variance is close to the (true) mean, because the counts have an approximate Poisson distribution, and where the sample value is used in substitution of the true mean. Melcombe (talk) 12:34, 21 December 2009 (UTC)[reply]

Cell

I think there is need to explain what specific sense of "cell" is intended in this article. Is "cell" the same as "category", that is, one of the possible outcomes of an observation? --Ettrig (talk) 13:03, 27 December 2009 (UTC)[reply]

Yates' correction

"A common rule is 5 or more in all cells of a 2-by-2 table, and 5 or more in 80% of cells in larger tables, but no cells with zero count. When this assumption is not met, Yates' correction is applied." This is misleading, as Yates' correction can only be applied to 2x2 tables. Also, it is not clear what "this assumption" refers to: Is it the rule of "5 or more", or the "no cells with zero count" rule?

2 Cell Example

The example given, testing whether a population has equal numbers of males and females, has only two cells, so the test used employs a chi-squared random variable with only 1 degree of freedom. This is technically valid, but if there are only two cells it's simpler to use a simple proportion test, since the number of males in the sample follows a binomial distribution. The chi-squared test only becomes necessary when there are 3 or more cells, giving a multinomial rather than binomial distribution. I suggest it would be more informative to replace the 2 cell example by a 3 cell example. (Or perhaps to add a new 3 cell example, and add further material to the 2 cell example showing how it it is really only doing a proportion test but in a more complex structure.) Jim 14159 (talk) 01:14, 26 April 2010 (UTC)[reply]

For future reference - I support the idea to add another example here (for something else then a goodness of fit test).Talgalili (talk) 19:59, 22 January 2011 (UTC)[reply]

Is the diagram the right one?

I used the diagram from chi-squared distribution in Calculating the test-statistic here. However, does anyone have doubt that it is the one to be used here, or finds any errors on what the axis tell? Mikael Häggström (talk) 09:05, 14 June 2010 (UTC)[reply]

The p-value is not the value on the y-axis of the pdf shown here, it's the tail area, i.e. the area under the curve to the right of the observed value. Equivalently, it's found by subtracting the cdf from 1. Qwfp (talk) 10:17, 14 June 2010 (UTC)[reply]

Thanks. I've changed it to a new one, showing the inverse of the cumulative distribution function. I hope it fits better. Mikael Häggström (talk) 10:41, 14 June 2010 (UTC)[reply]

Thanks, that was quick work! Qwfp (talk) 11:37, 14 June 2010 (UTC)[reply]

Insufficient explanation for chi-squared test nomogram

In the nomogram at the top, what does the ABCDE stand for? I'd appreciate an explanation in the subtexts. Mikael Häggström (talk) 11:20, 14 June 2010 (UTC)[reply]

There's some explanation at Nomogram#Chi-squared test computation nomogram. Since pocket calculators came out I can't think of any circumstance when this nomogram would be useful and it's quite hard to explain, so I don't think it's a good lead image. I'd suggest we remove it from this article. Qwfp (talk) 11:42, 14 June 2010 (UTC)[reply]

I agree and will now remove it. Mikael Häggström (talk) 18:38, 5 August 2011 (UTC)[reply]

Fact that example X^2 = 1

Another comment on the example. From experience teaching, I think the fact that the value of X^2 in this case is 1 could be very confusing--for example, people might think it's a probability, since that's on a scale of 1, or that the values of X^2 are constrained to integers, or any of a number of other problems. I suggest changing the numbers to 44 men and 56 women, or something else similar. (Eliminating confusion is much better than having simple arithmetic, I think.) I will change it myself in a few days if nobody thinks this is foolish. Motorneuron (talk) 23:54, 20 July 2010 (UTC)[reply]

Good point. I agree. Qwfp (talk) 07:19, 21 July 2010 (UTC)[reply]

I changed it to 44 and 56, which gives X^2 = 1.44 and p = .23, so very little needed changing.Motorneuron (talk) 16:37, 21 July 2010 (UTC)[reply]

How to mention Cramér's V?

Hi all,

I just added a link to "Cramér's V" but I think this deserves it's own section in this article, or an article of it's own. What do you think? Talgalili (talk) 22:24, 21 December 2010 (UTC)[reply]

I modified your link to a direct one. But note that there is more on Cramér's V in contingency table, and possibly elsewhere. Melcombe (talk) 16:48, 22 December 2010 (UTC)[reply]

Estimation of the Parameters

Is it allowed to use the Method of Moments instead of the Maximum Likelihood Method if one uses Pearson's chi-squared test as a goodness-of-fit test? — Preceding unsigned comment added by 84.83.33.64 (talk) 16:02, 6 July 2011 (UTC)[reply]

Allowed? Yes, but the consequences, in terms of how good the approximation being used is, would be different. That is, if some form of adjustment were being made to take account of the fact that parameters have been fitted, that adjustment would need to be re-investigated. However I don't think such adjustments are much used in the context of Pearson's chi-squared test as a goodness-of-fit test since, if one were worried about it, then one would be using a better test of fit. Melcombe (talk) 08:52, 7 July 2011 (UTC)[reply]

Assumptions - Isn't rather Type I error the problem?

In the paragraph "Assumptions" it is stated, that "The researcher, by using chi square test on small samples, might end up committing a Type II error.". To my understanding, this is not the main-point. I think the bigger problem is that the Type I error is inflated, as can be verified by the following R script (in a 2x2 table with margin distribution (2,2) and (2,2), the probability to commit a Type I error is around 20% for a specified alpha of 5% (which is not acceptable in many cases):

"colSums = c(2,2) rowSums = c(2,2)

prod = (colSums) %*% t(rowSums) expected = prod / sum(colSums) probs = expected / sum(colSums) sim = rmultinom(1000, sum(colSums), probs)

myP = numeric(ncol(sim)) falsePositive = 0 for(i in 1:ncol(sim)) { thisP = chisq.test(matrix(sim[,i], nrow = 2))$p.value myP[i] = thisP

       if(thisP < 0.05)

falsePositive = falsePositive + 1 }

falsePositive / ncol(sim)"

Assumptions - Isn't rather Type I error the problem?

In the paragraph "Assumptions" it is stated, that "The researcher, by using chi square test on small samples, might end up committing a Type II error.". To my understanding, this is not the main-point. I think the bigger problem is that the Type I error is inflated, as can be verified by the following R script (in a 2x2 table with margin distribution (2,2) and (2,2), the probability to commit a Type I error is around 20% for a specified alpha of 5% (which is not acceptable in many cases):

colSums = c(2,2)
rowSums = c(2,2)
prod = (colSums) %*% t(rowSums)
expected = prod / sum(colSums)
probs = expected / sum(colSums)
sim = rmultinom(1000, sum(colSums), probs)
myP = numeric(ncol(sim))
falsePositive = 0
for(i in 1:ncol(sim))
{ 
	thisP = chisq.test(matrix(sim[,i], nrow = 2))$p.value
	myP[i] = thisP
        if(thisP < 0.05)
		falsePositive = falsePositive + 1
}
falsePositive / ncol(sim)  — Preceding unsigned comment added by 78.52.192.206 (talk) 09:12, 17 July 2011 (UTC)[reply]

Definition

The last two paragraphs of the definition are too cumbersome.

It starts with 'The first step', but only at the very end comes 'A second important part...', as the second step. These steps are also fairly standard and not a characteristic of this particular test. Furthermore, if one is so explicit about the testing steps, the third step of actually comparing the test statistic with the critical value to obtain significance is missing.
Right after the statistic link there is long talk about ambiguous notations and the distinction between estimated and theoretical values. There is no reason to assume that the reader is confused about this, so all of this is simply distracting. Talk about estimation of statistics should go to the statistic page. Note, moreover, that X² is used in the Figure, opposite to what is demanded before.
Fisher's exact test is mentioned at the end of step 1 where it definitely does not belong.
One should not verbally describe formulas from the subsequent text. Here one can simply mention the formula itself and also normalization.
The description of the degrees of freedom is incorrect and in conflict with what is described later in the text.

— Preceding unsigned comment added by Muhali (talk • contribs) 18:11, 23 February 2012 (UTC)[reply]

Your comment on 'degrees of freedom' is very true and supplements the post about the use of a mere 2-cell (male or female) table as an example: (c-1)*(r-1) computes to 0 degrees of freedom. It should be 1. Medico80 (talk) 13:15, 16 September 2012 (UTC)[reply]

Does The Test Statistic Use "Frequencies" Or Whole Numbers?

The current article defines the test statistic in terms of "frequencies" but the current Goodness Of Fit example is phrased in terms of whole numbers of things. I think the definition of the test statistic is incorrect. Shouldn't it refer to expected and observed "numbers" of things instead of expected and observed "frequencies" of things? Many people would interpret "frequency" as implying a ratio.

Tashiro (talk) 15:54, 19 August 2012 (UTC)[reply]