Talk:Statistics

This article has been mentioned by a media organization:

Kathy Lange (December 1, 2006). "Differences Between Statistics and Data Mining". http://www.dmreview.com/ DM Review. {{cite news}}: External link in |agency= (help)

Mathematics B‑class Top‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics
B	This article has been rated as B-class on Wikipedia's content assessment scale.
Top	This article has been rated as Top-priority on the project's priority scale.

Statistics Unassessed

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
???	This article has not yet received a rating on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the importance scale.

Statistics was a good article, but it was removed from the list as it no longer met the good article criteria at the time. There are suggestions below for improving the article. If you can improve it, please do; it may then be renominated.
Review: June 11, 2006.

Template:WP1.0

This page is for discussion of the article about statistics. Comments and questions about the special page about Wikipedia site statistics (number of pages, edits, etc.) should be directed to Wikipedia talk:Special pages.

Please add new comments at the bottom of this page.

Number of data points

Was wondering if there was a name for the statistical principle that maintains that the more data points you have, the more reliable your dataset will be... Thanks.Jefferson61345 02:30, 8 August 2007 (UTC)[reply]

Yes it's the central limit theorem. —Preceding unsigned comment added by 82.32.9.240 (talk) 20:03, 30 September 2007 (UTC)[reply]

Are there any theorems or definitions related to a small number of data points? In particular, I'm wondering if there is a definition of the term "poor statistics" (or "weak statistics"), which is sometimes used by scientists when describing the statistical analysis of experimental data sets. Usually, this term is accompanied by the statement that "more data" are needed to improve the statistics. What is the limit in number of data points below which statistics are "poor"? Are there other factors to be taken into account? Is "weak statistics" equal to "poor statistics"? --Uxh (talk) 17:27, 2 May 2008 (UTC)[reply]

Questions

Question:- What is the procedure of finding no. of standard n X n latin square design ? Question:- What is the defination non_trivial sufficient statistics ? Pls solve this questions if possible. Thanks a lot. —Preceding unsigned comment added by 164.100.6.9 (talk) 05:40, 5 April 2008 (UTC)[reply]

Fallacy?

Statistics can be easily deemed a fallacy. If statistics say that kids whose parents don't talk to them about not smoking are more likely to smoke (you know the common argument), that is a fallacy. Yes, it may be a true statement, but it cannot be argued that the kids whose parents tell them not to smoke would not find smoking cool and that the kids whose parents didn't tell them not to smoke may decide may feel it is disgusting. Statistics as a field tend to treat all people as equal in all regards when that is clearly not true. Not everybody can throw 49 touchdown passes in an NFL season like Peyton Manning did in 2004 or be the leading goal scorer at the Soccer World Cup. I just figured this might be an idea to consider discussing in the article, even though it may be difficult to find a decent source. 205.166.61.142 00:31, 31 August 2006 (UTC)[reply]

You make some sweeping generalizations. One of the purposes of statistics is to attempt to explain an outcome with the most explanatory variables. If a certain type of person is more likely to have a certain kind of outcome (for example, black men tend to have more cardiovascular problems), it is in the best interest of such research to treat everyone differently, not the same. Statistics such as the t-test and ANOVA often differentiate people more than treat them the same. I think your football analogy may be one of the fallacies you are talking about. Football statistics are descriptive statistics--they only describe those people to which they apply (in your case, professional football players and nobody else). Inferential statistics, such as the t-test, often group people according to like kinds based on particular variables, like incidence rate of cardiovascular health problems. Chris53516 13:43, 31 August 2006 (UTC)[reply]

Let me add to that answer in case the poser of the question returns. Statisical methods are not (correctly) used to prove cause and effect or to make claims that something is always true. Statistics is more of an art of educated guessing where mathematical methods are used to make best decisions about what is most likely or what tends to be related. In fact, built into the methods of statistics are ways of determining how likely you are to make an error in your "educated guessing". Typically, someone using statistical methods correctly will say, "I am 99% sure that these two factors (such as not smoking and parents telling the child not to smoke) are related to each other." Then qualifiers will be added. Even in that case, a good statistician wouldn't claim that one factor causes the other. It could be that both items are caused by some third, unidentified, factor. But, of course, those types of misinterpretations of statistical results are made all the time. That doesn't mean, however, that the cause and effect is not logically the best interpretation to the situation. Suppose, for example, that a large number of people get sick who mostly all ate spinach. We might make a best guess that spinach caused the illness. But, really it might be something else like a common salad dressing used by spinach lovers or the fact that spinach stuck in their teeth chased away potential romantic relationships leaving the spinach-eaters in a heart-sick condition which eventually led to real illness. Of course, those alternatives are ridiculous. I guess they COULD be true, but most people would go with the theory that the spinach was teinted. And even if the spinach was the problem, it could be that, for some, there was another unidentified cause. So, we are left with concluding, "Probably this is the cause most of the time." --Newideas07 21:48, 3 November 2006 (UTC)[reply]

Need Link to Reliability (statistics) page

This page needs links to the pages on Reliability (statistics) and Factor Analysis. I'm not sure if these should be put under Statistical Techniques or See Also. I'm also wondering if there should be a link to Cronbach's Alpha (which is one type of reliability estimate).

It seems to me that there are probably quite a few statistical techniques that are not linked from this page. Perhaps it would be helpful to create a hierarchical index of statistical techniques. I see that something like this can be done in the Table of contents. Kbarchard 22:24, 16 September 2006 (UTC)[reply]

This page is not a list of statistical topics (which we link to in the "See also" section), and not every statistical technique or estimator needs to be listed here. The ones you mention seem a bit too specialised for a general article on statistics, but could be usefully added to articles like multivariate analysis and social statistics. -- Avenue 01:34, 18 September 2006 (UTC)[reply]

Standardized coefficient for DYK

I wrote an aricle on Standardized coefficient, but I am no expert in statistics. If this could be quickly vetted by an editor more experienced with this field, we could have a statistical WP:DYK.--_{Piotr Konieczny aka Prokonsul Piotrus | talk} 20:25, 7 October 2006 (UTC)[reply]

What is the difference between F(x) and f(x)?

Can somebody please explain to me with an example the difference between F(x) and f(x) for a continuous random variable? As far as I understand f(x) is a derivative of F(x), please correct me if I am wrong, but that is not sufficient enough for understanding the whole process. Many thanks. -Chetan. —Preceding unsigned comment added by Chetanpatel13 (talk • contribs)

Those two should be interchangable, as far as I know. By the way, use four ~ to sign with your user ID. Chris53516 17:07, 18 October 2006 (UTC)[reply]

Chris, thanks for the response, BTW they are very different. Thanks for the tip and hopefully I am doing it right this time. -- Chetan M Patel 18:24, 18 October 2006 (UTC)[reply]

How are they different? Please use 4 ~ to sign your name. It's easier than what you did. Chris53516 18:31, 18 October 2006 (UTC)[reply]

f(x) is probability density function (PDF) whereas, F(x) is cumulative distribution function (CDF). Chetan M Patel 18:58, 18 October 2006 (UTC)[reply]

The names of the functions are a convention, widely used in statistics. Perhaphs a better question is: whats the difference between a PDF and CDF? Its probably easiest to understand if you know about integration with

F(u)=\int _{x=-\infty }^{x}f(x)dx

. As we are working over a continuous domain the chance of a random variable taking a particular real-value, 0.123456789 say, is zero so it only makes sense to talk of probabilities calculated over a range of values and its a convention to use the range

[-\infty ,x]

giving the CDF. So yes

f(x)={dF \over dx}

. What is the meaning of the PDF, well if you consider a discrete probability distribution like the binomial distribution then the PDF is just the probability of a particular number, here the probabilities of a particualr number 0,1,2,3 occuring is non zero. Futhermore, PDF is useful for visulising the shape of a distribution, for the normal distribution it gives the familiar bell shaped curve, the CDF would be S-shaped and its harder to see whats happening. --Salix alba (talk) 20:45, 18 October 2006 (UTC)[reply]

Correction: that should be

F(u)=\int _{x=-\infty }^{u}f(x)dx

. The upper bound of integration must be u if F(u) is what you're evaluating. Michael Hardy 22:47, 18 October 2006 (UTC)[reply]

In case anyone wants a "Statistics for Dummies" explanation of all that: f(x) is the drawing of a curve that defines a certain probability density function (pattern). For example, a bell shaped curved has an equation, f(x), and represents a situation in which falling in the middle of some range is most likely with tapering probabilities as you go to the left or right. Most measurements of objects fall in this category. But, probabilities of having x in some range are found by calculating the area under the curve. To find the area under the curve, you have to integrate f(x) to get F(x). Sometimes, that is impossible or just really hard and so approximation techniques are used instead, which is why one reason why you usually get probabilities out of tables instead of using equations. There are other theoretical uses for the two functions. I'm not sure if that clarified things for anyone. --Newideas07 21:23, 3 November 2006 (UTC)[reply]

In case that didn't clarify things for some people, the 'statistics for dummies for dummies' version is that the pdf is the height of the density at a given point, whereas the cdf is the area under the curve fro a range of points. For example, if we want to know the probability of a person being 5'9" tall, that's a question for a pdf (f(x); if we want to know the probablity of being 5'9" or less, that's a cdf (F(x)). Plf515 02:09, 24 November 2006 (UTC)plf515[reply]

Note: the Y axis in gaussian distribution chart plot is labeled as "probablility". That is NOT correct, since probability for each given point is zero. Y axis stands rather for probability density —Preceding unsigned comment added by 146.107.217.52 (talk) 11:47, 30 June 2008 (UTC)[reply]

Name of Etymology subsection

Etymology here is the study of the history of the word statistics, not the history of statistics itself. The first paragraph or so of the current Etymology subsection is etymology, but the later paragraphs go beyond etymology to actual history of statistics. That's why I think there are many better, broader titles for this subsection. Or maybe I am interpreting etymology too narrowly? Joshua Davis 15:11, 21 October 2006 (UTC)[reply]

I think Etymology works, even if it does go beyond simple etymology. It's still related to the word's history. -- Chris53516 16:04, 22 October 2006 (UTC)[reply]

I agree that Etymology was not an accurate description here. I've tried to remedy the situation somewhat by moving some of this material to the Statistics Today section. I also removed a reference to Michel Foucault, which does not seem to me to belong here at all. Thefellswooper 22:06, 31 March 2007 (UTC)[reply]

Criticism

I would like to propose we change the name of this section to "The Misuse and Limitations of Statistics" or something similar as Joshua suggested. I also would like to make big revisions to it if no one is working on it or attached to it as it is. I'm a statistician (M.S.) and educator. If anyone objects or has a better idea or is already working away hard on this, speak soon or I'll do it. --Newideas07 22:04, 3 November 2006 (UTC)[reply]

I think that is a good topic, but for a separate article. There are certainly lots of abuses of statistics, but this page seems fine to me, needing only minor edits. Plf515 02:34, 24 November 2006 (UTC)plf515[reply]

I agree with the opening comment. Statistics is one of the three primary branches of mathematics (Pure, Applied and Statistics), and at the moment Pure and Applied seem to get more attention. Go for it Newideas07 David —Preceding unsigned comment added by 82.32.9.240 (talk) 20:01, 30 September 2007 (UTC)[reply]

Note about archives

I used a method that others may not like. If someone else wants to change the archive, find and copy any new comments, and begin at this page to do so: Start of archiving. Thanks for being patient while I made these archives. -- Chris53516 (Talk) 23:01, 3 November 2006 (UTC)[reply]

Merge from applied statistics

There was a suggestion at Talk:Applied statistics to merge into this article - it's only a stub but it may have some potential. I'll leave it for the statisticians here to decide. Richard001 19:53, 6 February 2007 (UTC)[reply]

Merge. In my opinion, "applied statistics" is a redundant phrase. To me it appears that statistics are often applied somehow. So, the article can be merged as a new section or integrated into this article. — Chris53516 ^(Talk) 20:27, 6 February 2007 (UTC)[reply]

I am not a statistician and cannot really comment on the material, so I won't "formally" vote. But the long-standing stubbiness and infrequent editing suggest a merge to me. I'd add that Mathematical statistics is similarly meager, covering nothing that isn't already covered here. Joshua R. Davis 13:54, 8 February 2007 (UTC)[reply]

Merge. I don't quite agree with Chris53516 when he asserts that "applied statistics" is pleonastic, but this article already covers the distinction between "applied statistics" and "theoretical statistics" adequately, in the introduction. I looked through the applied statistics article carefully, and in my opinion a merger is overkill. Applied statistics should simply be deleted. DavidCBryant 15:48, 8 February 2007 (UTC)[reply]

(Note. If someone deletes the page, be sure to redirect it to this article. — Chris53516 ^(Talk) 16:00, 8 February 2007 (UTC))[reply]

Having heard no objections, I have gone ahead and changed Applied statistics into a redirect page. Don't give up on Mathematical statistics quite yet, though. I'm trying to get hold of Dcljr, who had quite a few ideas on that score. I'm sure the theoretical article can be turned into something better pretty soon. DavidCBryant 01:50, 14 February 2007 (UTC)[reply]

misconceptions

not a statistician here but maybe the article ought to have a section addressing those. statistical mechanics has nothing to do with mathematical statistics. many areas are related to rigorous formulation of statistical mechanics: probability and analysis, topology, number theory, etc., but not statistics. i also removed the reference to "sports statistics". to call computing, say, slugging percentages or ERA's or free throw percentages doing statistics seems rather abhorrent, IMHO. Mct mht 07:09, 10 February 2007 (UTC)[reply]

Thanks for taking those (See also) links out. I concur with your decisions. Do you mean to tell me that Maxwell and Boltzmann aren't just two guys who played for the Yankees? ;^> DavidCBryant 12:36, 10 February 2007 (UTC)[reply]

who's on first base, Dave? :-) Mct mht 07:26, 11 February 2007 (UTC)[reply]

I don't mind losing "statistical mechanics", but in my view removing "sports statistics" is going too far. Sure, the routine collection of free throw percentages etc is not exactly groundbreaking statistical work, but it is a (small) part of statistics. I've seen several articles on aspects of sports statistics in reputable statistical journals. They're admittedly more common in lighter fare (e.g. the ASA's Chance magazine has a regular column titled A Statistician Reads the Sports Pages), but they demonstrate that professional statisticians view sports statistics as within their ambit. -- Avenue 03:21, 11 February 2007 (UTC)[reply]

i am certainly in no position to object if that's the concensus of professional statisticians. Mct mht 07:26, 11 February 2007 (UTC)[reply]

Statistical mechanics is indeed probabilistic mechanics, but I'd be inclined to leave the link here. Sports statistics, as pointed out, is deeper than people may realize. (There was a great article on this in the WSJ around August or Sept. of last year.) There is legitimate inferential statistics going on there, e.g. attempts to correct for the effects of luck on a player's stats. JJL 03:47, 11 February 2007 (UTC)[reply]

I don't much care if sports statistics are listed in this article. At least they're comparable (in quantity) to the other kinds of data regular statisticians deal with. But let's keep the references to physics out of the "see also" list ... the meaning of "statistics" in the context of physics and thermodynamics is substantially different from the meaning this article deals with. I guess I could say I use a result from statistical mechanics (a measurement of the ambient temperature) to "make an informed decision" (whether to wear a flannel shirt, or not). But that really seems like stretching the point, to me. Oh – what's on second, and who's on third. ;^> DavidCBryant 17:25, 11 February 2007 (UTC)[reply]

Statistics and Accuracy

Can an expert out there please discuss the topic of statistics and accuracy. For example, do statistics HAVE to be accurate? Or can statistics be a general indication of a trend, reality, etc.

In general, the data from which statistics are derived are as accurate as the observers/experimenters/statisticians can make them. I suppose that observational errors are possible (I might think the lights are off when they're really on ... maybe I just went blind, and haven't realized that yet), but in practice observational errors are fairly rare, and easily controlled.

Even though the observations are accurate, the statistics themselves may be imprecise. In general, the larger the number of observations that can be made, the more precise the statistical estimates that emerge. This tendency of the collected data in a small sample to diverge somewhat from the true characteristics of a sampled population is analyzed, in the first instance, by the statistical variance of the data collected.

Notice that certain kinds of data (mostly relating to people's opinions, and similar subjective measurements) are inherently less reliable than the measurements that can be made in fields like chemistry and physics. Such data can easily be manipulated to reach misleading conclusions, no matter how carefully statistical procedures are carried out (for example, by asking biased questions, or by limiting the allowed responses on a questionnaire, etc.) DavidCBryant 04:33, 10 August 2007 (UTC)[reply]

Actually, to qualify as a measurement, a set of observations only have to result in a reduction in uncertainty, not necessarilly ellimination of uncertainty. In other words, if the accuracy is greater than the accuracy of your previous uncertain estimate, then it told you something you didn't know. I just wrote a book about it called "How to Measure Anything".Hubbardaie 22:35, 10 August 2007 (UTC)[reply]

Three types of lies

Lies, damn lies and statistics —Preceding unsigned comment added by 70.80.220.247 (talk) 14:46, 28 October 2007 (UTC)[reply]

Misuse of statistics

Currently the Misuse of statistics section contains a quote from Dennis Lindley that is not referred to in the text and has nothing to do with misuse, as far as I can tell. I think that this section is also disproportionately large (roughly 20% of the text), in danger of giving the casual reader the impression that statistics as a discipline is inherently untrustworthy or controversial. It's also loaded with weasel words.

I propose that we shorten this section dramatically and leave the details to the Misuse of statistics article (so that it's similar to the short History of statistics section, with its accompanying History of statistics article).

In fact, I think that the misuse/misinterpretation paragraph of the Overview section is itself sufficient, without a Misuse section at all, but probably I'm in the minority there? Joshua R. Davis (talk) 16:42, 20 January 2008 (UTC)[reply]

I do think that a misuse of stats. section is valuable here, and a longer article on it elsewhere is also useful. While there may be a case for some rebalancing, I am fine with the section as is. Certainly, many people coming to this page will be familiar with "lying with statistics" and with the perception that stas. is, as you say, inherently controversial, and this section both addresses that and puts it in a more formal context. I agree that the Lindley quote is misplaced here and should be (re)moved. But the current section nicely transitions from the general perception of lying with stats. to the more scientific concerns over hypothesis testing, p-values, etc. JJL (talk) 17:28, 20 January 2008 (UTC)[reply]

I agree that the section is worth having, but there's also a lot of room for improvement. I've made a few changes to the second paragraph. I think the part about hypothesis testing could be reduced to a simple statement that CIs are preferable to p-values. The Bayesian bit should either be expanded or removed; just saying it's another option, but has its own critics, gives the reader very little information. Mentioning publication bias might be useful. The paragraph on the Abelson perspective is interesting, but does it really deserve this much prominence? -- Avenue (talk) 23:40, 20 January 2008 (UTC)[reply]

I have tried to make the section more concise in a manner compatible with these opinions. It still has a lot of weasel words, since I haven't verified any of the information. Joshua R. Davis (talk) 00:12, 31 January 2008 (UTC)[reply]

Statistics As Principled Argument, by Robert P. Abelson

I think that the following is interesting and deserves to be in Wikipedia. But I do not think that it should be in this article. Maybe in a more specialized (new) article on the foundations/philosophy of statistics.

In his book Statistics As Principled Argument, Robert P. Abelson articulates the position that statistics serves as a standardized means of settling disputes between scientists who could otherwise each argue the merits of their own positions ad infinitum. From this point of view, statistics is a form of rhetoric; as with any means of settling disputes, statistical methods can succeed only as long as all parties agree on the approach used.

So I have put the paragraph here, and deleted it from the article. —Preceding unsigned comment added by 86.156.222.165 (talk) 10:22, 31 January 2008 (UTC)[reply]

I have created a new article Foundations of statistics, which incorporates the above quoted paragraph. The article is currently a stub. TheSeven (talk) 11:19, 31 January 2008 (UTC)[reply]

I think the point of view that staistics is rhetoric is valid and merits inclusion in the main statistics article. The "Misuse" section may not have been the optimal place for it but I'd like to see a statement to that effect somewhere here. Abelson is an obvious reference for that viewpoint but not the only one. JJL (talk) 12:48, 31 January 2008 (UTC)[reply]

What I would ideally like is the article Foundations of statistics expanded from a stub into a real article. Then the Statistics article could include a paragraph that summarized the foundations, and linked to that as the main article on the topic. The former should be done anyway, I think; it is an important topic. TheSeven (talk) 14:34, 31 January 2008 (UTC)[reply]

Is "Foundations of statistics" a term that statisticians use to talk about this stuff, or did we just make it up? When I hear it (as a non-statistician) I think probability theory. The Abelson stuff seems better described as "Philosophy of statistics". Is there a lot to say about the philosophy of statistics? (I'm honestly asking.) Joshua R. Davis (talk) 14:38, 31 January 2008 (UTC)[reply]

"Foundations of statistics", or "foundations of mathematical statistics", are common terms. I just tried googling and got 110,000 results. There are also books with that title. There is a substantial philosophical component to this though. Googling for "philosophy of statistics" gave me 88,000 results. So perhaps there should be an article with that title, which redirects to the foundations article—? TheSeven (talk) 15:01, 31 January 2008 (UTC)[reply]

I think "Mathematical Statistics" and "Philosophy of Probability" are more common. You don't see many (separate) phil. of stat. courses; for example, try searching for it at Amazon. The viewpoint Abelson discusses at length isn't his own theory; in my experience it's reasonably common among statisticians--like a formal mathematical proof, an hypothesis test is a form of argumentation (a practical form, a la Peirce, say). I do think that the article must address the fact that an hypothesis test (etc.) is a way of settling disputes as well as a way of finding things out. When the FDA asks for statistical arguments, that's what it wants--an argument that the drug is effective and safe. JJL (talk) 15:14, 31 January 2008 (UTC)[reply]

I have not heard the phrase "Philosophy of Probability" before, as far as I can recall. I just tried googling for it, and got 766 results. Compared with 110,000 for "foundations of statistics". TheSeven (talk) 15:27, 31 January 2008 (UTC)[reply]

Wait, let's go apples-to-apples! For "Foundations of stats." I think one more commonly sees "Math. stats." as the foundations are in analysis and probability. For "Philosophy of stats." one more commonly sees it as part of a "Philosophy of Probability" course/book than on its own. Here are a few Phil. of Prob. books: [1], [2], [3], [4]. The word chance commonly appears in its place (again, Peirce is an example), and of course it also can be studied in a modern physics context. I can't find a book entitled "Philosophy of Statistics" there; Foundations of stats. does make an appearance [5]. JJL (talk) 15:41, 31 January 2008 (UTC)[reply]

This is interesting, especially because several people are claiming that the topic is too insignificant to merit a Wikipedia article of its own. See here—you can vote if you wish. TheSeven (talk) 17:38, 31 January 2008 (UTC)[reply]

Would a better name be centred around "statistical inference", rather than just "statistics" ? Melcombe (talk) 15:09, 12 February 2008 (UTC)[reply]

The discussion in this section is effectively closed. The right place would now be the discussion in Foundations of statistics. (According to that article though, this is the standard name for the topic. Moreover, it has far more Google hits.) TheSeven (talk) 21:53, 12 February 2008 (UTC)[reply]

Considered picture add to history section

Considering to add the following picture to the history section. Any objections or comments should be made now before picture is added. —Preceding unsigned comment added by TeH nOmInAtOr (talk • contribs) 18:42, 12 June 2008 (UTC)[reply]

Bayesian and Frequentist / Modern history

The role of, and controversy between, Bayesian probability and Frequency probability is very important and warrants fronting. Also, this article and the History of statistics make virtually no mention of 20th century developments.

Nils (talk) 22:50, 17 August 2008 (UTC)[reply]

External links—recent changes

Reasons for most of the changes were given in the edit summaries. The changes should not be reverted without addressing those reasons. I also do not see that a section "Resources at educational institutions" is more helpful for readers than a section "Online courses and textbooks": better to have a section that tells people what is at the link instead of where the link is located. TheSeven (talk) 19:49, 6 September 2008 (UTC)[reply]

Where are the "reasons" in your edit summary??? In your edit you reverted "3 E digest links, links to products and link to personal website by non-academic" and deleted the invisible editing comment to prevent future spamming. Please explain why the links you push should be exempted from the restrictions established at WP:ELNO? The external link policy page states: "Except for a link to an official page of the article subject one should avoid: [...] 4. Links mainly intended to promote a website. 5. Links to sites that primarily exist to sell products or services, or to sites with objectionable amounts of advertising [...] 9. Links to the results pages of search engines, search aggregators, or RSS feeds [...] 11. Links to blogs and personal web pages, except those written by a recognized authority. This is meant to be a very limited exception. As a minimum standard, recognized authorities always meet Wikipedia's notability criteria for biographies." [my emphasis]

- About your links: The link to the commercial enterprise StatSoft,Inc. is inappropriate as per 4 and 5 above. The self-published, personal webpage informath is inappropriate as per 4 and 11 above, and it is by a non-authoritative source as per 11 above. The free download site www.freestatistics.info/en/about.php is iffy for several reasons, including 11 above, but also related to this statement by the author: "All programs listed on the Free Statistics Web Site at freestatistics.info are the sole property of their respective authors. [...] I don't accept responsibility about all the sofware listed in the Free Statistics Web Site. My goal is however to list only software that in no way could damage the Pc of users." The site www.ericdigest.org is a great search engine resource where students can conduct searches for scholarly publications in every single subject we cover on Wikipedia - however, it would appear inappropriate as per 9 above. I'll still leave it among the links here now, for further review.

- About the subheads: "Online courses and textbooks" is an invitation to commercial enterprises that sell these products to spam us. Limiting this section to university related websites ensures that we will not involuntarily be issuing spam-invitations to textbook & software companies and companies offering online "diploma mill" courses. The subhead "Other resources" opens the door for sites by non-authoritative sources with self-promotional interests, who are here primarily for the joy of seeing their own personal webpage featured on Wikipedia (please read WP:SPAMMER). As the now deleted entries showed, it was also seen as a spam invitation by companies offering free software sections on their sites as a small part of the main commercial section.

- About the links to the international organizations established to promote the study of statistics: I find them appropriate here, and since that section is not violating any policy, I see no reason to delete it. However, I have not reverted your deletion of that section. 71.106.254.126 (talk) 23:50, 7 September 2008 (UTC)[reply]

Taking your points in turn....

Regarding edit summaries, I made many edits, and included a reason with most of them.

I deleted your "invisible editing comment" in error--oops.

After reading point 5 at WP:ELNO#Links_normally_to_be_avoided, I now agree with deleting the StatSoft link.

The informath site is non-commercial, and is written by a mathematician who used to work on Wall Street, now studies independently, and has several peer-reviewed publications [6][7]. And the link is useful/informative. So I do not see how 4 or 11 applies.

I do not understand your objection to freestatistics.

Glad you agree on the ERIC link.

Regarding subheadings, they seem to have attracted only one link that is considered spam (StatSoft--and I think that the link is actually quite useful, and would greatly prefer to include it if it did not violate policy). The subheadings are also helpful for readers. So we do not agree on this.

Regarding links to the international organizations, if you had read my edit summaries you would know that I deleted them because List of academic statistical associations is linked to under "See also".

Two other points....

What do you think of [8]? The content seems good, but there already is a link to something by the same author.

I put a new discussion topic at List of basic statistics topics, which you might like to comment on.

TheSeven (talk) 04:12, 8 September 2008 (UTC)[reply]

The external links policy states that personal web pages, except those written by "a recognized authority", are to be avoided. The authors of the sites you have now reintroduced in the article are not "recognized authorities" that fit into the definition on the policy page, which states that "recognized authorities always meet Wikipedia's notability criteria for biographies". Being a "mathematician who used to work on Wall Street" does not exactly automatically elevate you to the status of "authoritative source in statistics" (the same seems to be valid for the software reviewer who writes the articles on the self-published site freestatistics). A quick search of scholarly publications in statistics and math for the past 10 years reveals no peer-reviewed articles by the creator of the site informath in those subjects (although he appears to have published a couple of articles in other fields), and there seems to be no academic institutions teaching statistics that could vouch for his competence as an authority in the subject. All that aside, the page you push has very little substance and is not about the important methodological disputes that have taken place in this subject, which the title seems to imply. It instead makes the following, rather self-obvious and simple, observation: "The assumption-making phase of a statistical analysis can be disputed, unlike the calculation phase." Having that link in the external links section sets a bad example which makes it harder to explain to other users who want to add their own personal webpages that it is inappropriate to do so. May I ask why the mentioned sites are so important for you to include here that you seem ready to engage in an edit war over them? Also: Why are you removing the name of the universities in the other external links? Regarding your question about Prof. David Lane's Rice Virtual Lab in Statistics: that page links to his "HyperStat Online Statistics Textbook". No reason to link to both individually. Will update the external links section accordingly. 71.106.254.126 (talk) 08:57, 8 September 2008 (UTC)[reply]

On informath, I did not know about "recognized authorities always meet Wikipedia's notability criteria for biographies"; so I agree with removing this.

You did not address my objection to removing a subheading for "Online courses and textbooks".

You removed freestatistics, after I restored it (with explanation), without explanation.

I do not see why the two "web" links benefit from information like "by Gordon K Smyth, The Walter and Eliza Hall Institute of Medical Research"; how does this help the reader?--for me, it just distracts.

TheSeven (talk) 10:50, 8 September 2008 (UTC)[reply]

No, the subhead "Online courses and textbooks" have continuously attracted spam links (as evident by the long-running clean-up efforts in the article's history, for example -a few of many- [9], [10], [11], [12]). The reason is that it is formulated in a way that seems to invite people who sell online courses and textbooks to enter their products, which is not a well formulated headline. To say that it attracted only one spam link is a bit of an understatement.

Freestatistics was commented on in both posts: please reread. In short: The same rules apply to that site as those used to explain why the other personal, self-published website was inappropriate.

About giving the source in the external links, please see WP:EL#External links section. It states: "a concise description of the contents and a clear indication of its source is more important than the actual title of the page". I'll shorten the description of Gordon Smyth's link to make the source of the link more concise, but sources are needed for all of the external links in that section. If the sources distract you, I'd suggest you take that issue up for discussion in a more general discussion of links, for example on the talk page of the policy or style guide pages. 71.106.254.126 (talk) 00:34, 9 September 2008 (UTC)[reply]

Regarding "long-running clean-up efforts", you might note that I am the editor for one of the edits that you cite. In fact, I have been cleaning the External links for some time now: most recent four-- [13], [14], [15], [16] (and there are priors). So I am well familiar with the issue.

I still believe that readers benefit from having the subheading, and I have been watching the External links section to keep improper things out. My familiarity with WP policies is obviously not perfect though: you have rightly cited policies with which I was unfamiliar--including here about "a clear indication of [a link's] source". Perhaps you can think of another wording for the subheading that does not, in your view, invite spam. In the meantime, I would prefer to keep the subheading and monitor it--even if my monitoring is not perfect.

What is the reason for mentioning "David Lane", "Gordon K Smyth", and "Statistics Community"? The last is also obviously inaccurate.

The OnlineStat book that I cited above [17] is not the same as the HyperStat book at Rice. For example, compare the discussion of the t distribution in the former [18] to that in the latter [19]. Based on an admittedly-cursory look, the former is the best (free) online intro stats books there is. Yet the Statistics article does not currently link to it.

TheSeven (talk) 10:59, 9 September 2008 (UTC)[reply]

The definition of Personal web page, which is linked to from WP:EL, is a web page "created by an individual to contain content of a personal nature". So your criticism of freestatistics and informath is not valid. I have also added "OnLine Statistics", mentioned above. TheSeven (talk) 18:38, 10 September 2008 (UTC)[reply]

Self-published ruminations online by an individual who is a non-authoritative source are personal in nature, regardless of subject matter (or, as per the article you refer to: "The content of personal web pages varies and can, depending on the hosting server, contain anything that any other websites do."), The external links policy refer to "personal web pages" in the context self-published site by a non-authoritative source, the opposite of a site "published by a reliable source" and a personal site "published by a recognized authority". As for the argument that the sites are "non-commercial", see common spammer strawmen. Please also read reply at Wikipedia_talk:External_links#Blogs Afv2006 (talk) 20:30, 10 September 2008 (UTC)[reply]

Please cite a reference for your claim about "self-published site by a non-authoritative source". Note that if your claim were valid, it would also require removal of other links, such as that for Dallal's Statistical Practice. And your link to Wikipedia_talk:External_links#Blogs makes no sense. I do not believe that you have read what you are editing. What is your purpose? TheSeven (talk) 20:57, 10 September 2008 (UTC)[reply]

The source for the statement about self-published sites by non-authorities is the policy page on verifiability: "Anyone can create a website or pay to have a book published, then claim to be an expert in a certain field. For that reason, self-published books, newsletters, personal websites, open wikis, blogs, knols, forum postings, and similar sources are largely not acceptable." See also WP:reliable sources, and point 2 on the List of sites normally to be avoided. Afv2006 (talk) 00:06, 11 September 2008 (UTC)[reply]

Using the personal web pages as sources instead

Following some discussion at Wikipedia_talk:External_links#Blogs, I have some additional thoughts on what should and should not be included in External links here. My opinion is that all the current links are appropriate except one, informath. The informath link should probably be removed. On the other hand, I think that the link provides useful information about statistics for non-experts (which will include most readers of the WP article). WP:EL says that if the "page to which you want to link includes information that is not yet a part of the article, consider using it as a source for the article". My suggestion is that that is what should be done: remove that one link and incorporate its content either in this article or perhaps in Misuse of statistics.

Your thoughts? TheSeven (talk) 23:06, 10 September 2008 (UTC)[reply]

Drive-by comment. I was shocked to see this article only had one note/reference. Most often there should be zero external links in such a case. If this largish group of external links can be used to reference the article, great. If none of these are reliable sources, then get rid of them all. Presumably that is not the case, so the mission here should be to source this article properly, and then look at whatever external links are left. For a broad topic like this I would think there would be a lot of references, and then only have one single Dmoz external link. 2005 (talk) 23:21, 10 September 2008 (UTC)[reply]

Agree 2005, it needs to be properly sourced. As per established policy, personal web pages cannot be used as sources in Wikipedia articles. The ideas presented above, ie. having a self-published, non-peer reviewed article from one of these sites used instead as a source in this article, or using the article to create and base an entire, individual Wikipedia article on, is a bit absurd. That is likely to set off even more bells for those who are concerned when it comes to Wikipedia:Conflict of interest#How to avoid COI edits. Afv2006 (talk) 00:06, 11 September 2008 (UTC)[reply]

Archives