Talk:P-value: Difference between revisions

Content deleted Content added

Inline

Revision as of 08:51, 26 July 2014

This is the talk page for discussing improvements to the P-value article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1, 2: 3 months

This is the talk page for discussing improvements to the P-value article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1, 2: 3 months

Template:Vital article

This article has not yet been rated on Wikipedia's content assessment scale.
It is of interest to the following WikiProjects:

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Statistics Top‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
Top	This article has been rated as Top-importance on the importance scale.

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Mathematics Mid‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics
Mid	This article has been rated as Mid-priority on the project's priority scale.

Source cited for definition of P-value

For an article about p-values, it is odd that the definition is taken from a journal paper that disparages the p-value: I would not expect a fair/balanced definition from such a source. If the definition was not taken from the cited paper, then the citation is wrong and is probably due to somebody's personal agenda.

What am I missing?

The third point on the Misunderstanding p-value section mentions that "The p-value is not the probability of falsely rejecting the null hypothesis." In symbols, the type I error can be written as Pr(Reject H | H). If we have a rule of rejecting the null hypothesis when p <= alpha, then this becomes Pr(Reject H | H) = Pr(p<=alpha | H). However, since p-value is a uniform over [0,1] for simple null hypothesis, this would mean that Type I error is indeed given by Pr(Reject H | H) = Pr(p<=alpha | H) = alpha. So it is indeed correct to interpret alpha, the cut-off p-value, as the type I error. What exactly am I missing here? (Manoguru (talk) 16:40, 3 December 2013 (UTC))[reply]

After reading earlier talk pages, I just realized that this point has been extensively discussed. As for myself the issue is resolved. Who in their right mind will ever confuse a p-value for alpha level? (Manoguru (talk) 17:13, 3 December 2013 (UTC))[reply]

Not sure how to read earlier talk pages - could you post a link? More importantly, although your understanding, and the articles understanding in the sixth paragraph ("The p-value should not be confused with the Type I error rate [false positive rate] α in the Neyman–Pearson approach.") are that the p value and alpha are different things, the article in para 4 and 5 under the section "Definition" reads as if the p value and alpha are the exact same thing, and completely interchangeable. This needs to be re-written, and the relationship or lack thereof between the two needs to be made a part of the main article, not just some old talk page.

(182.185.144.212 (talk) 18:59, 4 April 2014 (UTC))[reply]

You can click on the Archive no. 1 in the box above. Here is the link if you need it https://en.wikipedia.org/wiki/Talk:P-value/Archive_1 It would have been better if you had pin pointed the specific part that you found confusing. I am not sure how you got the notion that the two concepts are same by reading the text. I have made some modifications, which I hope you will find useful. The p-value tends to change with every repetition of a test that deals with same null hypothesis. However, the alpha is always held fixed by the investigator for every repetition and does not change. The value of alpha is determined based on the consensus of the research community that the investigator is working in and is not derived from the actual observational data. Thus the setting of alpha is ad-hoc. This arbitrary setting of alpha has often been criticized by many detractors of p-value. Indeed, the alpha needs to be fixed a-priori before any data manipulation can even take place, lest the investigator adjust the alpha level a-posteriori based on the calculated p-value to push his/her own agenda. For more about the alphas, you should look into the article statistical significance. (Manoguru (talk) 11:32, 16 April 2014 (UTC))[reply]

There are some minor issues here. Today there are different notions of p-value. Strictly speaking, a p-value is any statistic assuming values between 0 and 1 (extremes included). The definition given here is the standard p-value. This is indeed a statistic (a random variable which can be calculated once given the value of the sample). It is not a probability at all. It is indeed a random variable and in fact the standard p-value has a uniform distribution in [0,1] under the (simple) null hypothesis whenever the sample is absolutely continuous. A desirable property of a p-value (as is usually, but not always the case of the standard one) is that p-values be concentrate near zero for the null hypothesis and be concentrated at 1 under the alternative hypothesis.BrennoBarbosa (talk) 16:22, 17 June 2014 (UTC)[reply]

That would be a nice addition. But since I have not heard of this idea before, I invite you to make the necessary amendments. (Manoguru (talk) 08:54, 6 July 2014 (UTC))[reply]

normal distribution related

Is it fair to assume that the p value is based on the fact that the outcome of any normal random variable will tend to fall in the 95% space of its distribution? -Alok 06:39, 13 February 2014 (UTC) — Preceding unsigned comment added by Alokdube (talk • contribs)

Not really, since the outcome of any random variable, whether it is normal or not, will tend to fall in the 95% of its distribution. (Manoguru (talk) 12:54, 18 February 2014 (UTC))[reply]

You are a bit off in that statement, the 5% space is "defined" as that with low probability in case of a normal distribution function. If the RV p.d.f. was a rectangle -what would the p value be? The context is specific to the tail ends of the pdf or the low probability outcomes of random variable. That "chunk of 5%" can lie anywhere, but it is assumed to be the cumulative area in the region with low probability from my understanding. -Alok 17:06, 3 July 2014 (UTC) — Preceding unsigned comment added by Alokdube (talk • contribs)

You are contradicting yourself in your last statement. You see, when you say 5% or 95% of a distribution, you are specifying a probability of some event. You have not prescribed what the event is, only the probability associated with that event. So you are correct to say that 5% region can lie anywhere, be it at the tail end or at the most likely outcome for a normal distribution. But you cannot just say a region with low probability, since that region can, as you said it, can come from anywhere. Thus the contradiction of that statement. It is important to specify the event to be the tail event when talking about p-values. Also p-value need not be restricted to normal random variable. For instance, for a uniform distribution defined over an interval [a,b], for a left tailed event {X<=x}, the p-value is simply the area under the rectangle from 'a' to 'x', Pr(X<=x|U[a,b]) = (x-a)/(b-a), even though uniform distribution does not have a 'tail'. (Manoguru (talk) 10:20, 5 July 2014 (UTC))[reply]

P values need not be associated with normal RVs but they are relevant only for low probability zones. -Alok 17:22, 5 July 2014 (UTC) — Preceding unsigned comment added by Alokdube (talk • contribs)

Technically, for a normal distribution the interval (mu-epsilon, mu+epsilon), where mu is the mean and the epsilon is a very small number, still counts as your "low probability zone", even though the most likely outcome of the normal is the mean. By "low probability zone", do you mean places with low pdf values? (Manoguru (talk) 09:09, 6 July 2014 (UTC))[reply]

Low probability zones would amount to low pdf. I am pretty sure of my knowledge of statistics and probability, am an engineer and an economist by the way and have used random variable models for several years. I am simply trying to point out that the article takes a 5% p value cutoff simply because that is the low probability zone of a normal distribution, which may not be the case for all random variables- all RVs are not normal. To answer your quote above, the 5% cut off for p implies the tail 5% zone of the normal distribution function, whereas the article should perhaps say it should be the lowest 5% zone of outcomes in the the pdf of the RV. Hope that clarifies. 223.227.28.241 (talk) 11:26, 6 July 2014 (UTC)[reply]

Hi, thanks for the clarification. I think we are both on the same page. However I don't think there is any ambiguity in the article, since it is clearly mentioned in the definition section that the cutoff value is independent of the statistical hypothesis under consideration. (Manoguru (talk) 17:00, 6 July 2014 (UTC))[reply]

yes but it should be the lowest 5% zone of the cdf else it is normal specific — Preceding unsigned comment added by 223.227.98.64 (talk) 06:18, 19 July 2014 (UTC)[reply]

I am not quite sure what you are talking about anymore. It feels we are not discussing p-values, but rather what values of cutoffs to take. It is clearly mentioned in the definition section that the values of cutoff is entirely up to the researcher to decide, and does not depend on what type of distribution is assumed, normal or non-normal. If I may paraphrase you just so I understand you right, by "it should be the lowest 5% zone of the cdf else it is normal specific" do you mean to say that had the cutoff been any other percentage, say 1% or 10% then that cutoff is related only to a normal distribution? But that's certainly not true. Perhaps it would be helpful if you could point out the particular passage in the article that you find confusing. (Manoguru (talk) 11:15, 21 July 2014 (UTC))[reply]

The word extreme needs to be clarified in "In statistical significance testing, the p-value is the probability of obtaining a test statistic result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true" ---Extreme to me sounds like a fixation that the curve is somewhat like the normal distribution function, and values away from the mean (in either direction" are "extreme" or have low probability .