Empirical statistical laws: Difference between revisions
On what basis can they be considered scientific laws? No citations. No citation for the term "empirical statistical law". |
m Added link |
||
(31 intermediate revisions by 23 users not shown) | |||
Line 1: | Line 1: | ||
{{short description|Statistical behavior found in a wide variety of datasets}} |
|||
<!-- Please do not remove or change this AfD message until the issue is settled --> |
|||
{{AfDM|page=Empirical statistical laws|logdate=2009 July 30|substed=yes}} |
|||
⚫ | An '''empirical statistical law''' or (in popular terminology) a '''law of statistics''' represents a type of behaviour that has been found across a number of [[dataset]]s and, indeed, across a range of types of data sets.<ref>Kitcher & Salmon (2009) p.51</ref> Many of these observances have been formulated and proved as [[statistical]] or [[probabilistic]] theorems and the term "law" has been carried over to these theorems. There are other statistical and probabilistic theorems that also have "law" as a part of their names that have not obviously derived from [[Empirical research|empirical observations]]. However, both types of "law" may be considered instances of a [[scientific law]] in the field of statistics. What distinguishes an empirical statistical law from a formal statistical theorem is the way these patterns simply appear in [[Normal distribution|natural distributions]], without a prior theoretical reasoning about the data. |
||
<!-- For administrator use only: {{oldafdfull|page=Empirical statistical laws|date=30 July 2009|result='''keep'''}} --> |
|||
<!-- End of AfD message, feel free to edit beyond this point --> |
|||
== Examples == |
|||
⚫ | An '''empirical statistical law''' |
||
There are several such popular "laws of statistics". |
|||
The [[Pareto principle]] is a popular example of such a "law". It states that roughly 80% of the effects come from 20% of the causes, and is thus also known as the 80/20 rule.<ref>{{Cite news|url=https://www.nytimes.com/2008/03/03/business/03juran.html|title=Joseph Juran, 103, Pioneer in Quality Control, Dies|last=Bunkley|first=Nick|date=2008-03-03|work=The New York Times|access-date=2017-05-05|issn=0362-4331}}</ref> In business, the 80/20 rule says that 80% of your business comes from just 20% of your customers.<ref>{{Cite news|url=http://www.investopedia.com/terms/1/80-20-rule.asp|title=80-20 Rule|last=Staff|first=Investopedia|date=2010-11-04|work=Investopedia|access-date=2017-05-05|language=en-US}}</ref> In software engineering, it is often said that 80% of the errors are caused by just 20% of the bugs.<ref>{{Cite news|url=http://www.crn.com/news/security/18821726/microsofts-ceo-80-20-rule-applies-to-bugs-not-just-features.htm?itc=refresh|title=Microsoft's CEO: 80-20 Rule Applies To Bugs, Not Just Features|last=Rooney|first=Paula|date=2002-10-03|work=CRN|access-date=2017-05-05|language=en}}</ref> 20% of the world creates roughly 80% of worldwide GDP.<ref>{{Cite book|title=1992 Human Development Report|publisher=Oxford University Press|others=United Nations Development Program|year=1992|location=New York}}</ref> 80% of healthcare expenses in the US are caused by 20% of the population.<ref>{{Cite web|date=June 2006|title=Chart 1: Percent of Total Health Care Expenses Incurred by Different Percentiles of U.S. Population: 2002|url=https://archive.ahrq.gov/research/findings/factsheets/costs/expriach/expriach1.html|work=Research in Action, Issue 19|publisher=Agency for Healthcare Research and Quality|location= Rockville, MD}}</ref> |
|||
[[Zipf's law]], described as an "empirical statistical law" of [[linguistics]],<ref>Gelbukh & Sidorov (2008)</ref> is another example. According to the "law", given some dataset of text, the frequency of a word is inversely proportional to its frequency rank. In other words, the second most common word should appear about half as often as the most common word, and the fifth most common world would appear about once every five times the most common word appears. However, what sets Zipf's law as an "empirical statistical law" rather than just a theorem of linguistics is that it applies to phenomena outside of its field, too. For example, a ranked list of US metropolitan populations also follow Zipf's law,<ref>{{Cite journal|last=Gabaix|first=Xavier|date=2011|title=The Area and Population of Cities: New Insights from a Different Perspective on Cities|url=http://pages.stern.nyu.edu/~xgabaix/papers/zipf.pdf|journal=American Economic Review|volume=101|issue=5|pages=2205–2225|doi=10.1257/aer.101.5.2205|arxiv=1001.5289|s2cid=4998367}}</ref> and even [[forgetting]] follows Zipf's law.<ref>{{Cite journal|last1=Anderson|first1=John R.|last2=Schooler|first2=Lael J.|date=November 1991|title=Reflections of the Environment in Memory|url=http://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2012/12/37JRA_LS_PS_1991.pdf|journal=Psychological Science|volume= 2| issue = 6|pages=396–408|doi=10.1111/j.1467-9280.1991.tb00174.x|s2cid=8511110}}</ref> This act of summarizing several natural data patterns with simple rules is a defining characteristic of these "empirical statistical laws". |
|||
Examples of empirically inspired statistical laws that have a firm theoretical basis include: |
Examples of empirically inspired statistical laws that have a firm theoretical basis include: |
||
Line 10: | Line 15: | ||
:*[[Law of truly large numbers]] |
:*[[Law of truly large numbers]] |
||
:*[[Central limit theorem]] |
:*[[Central limit theorem]] |
||
:*[[Regression |
:*[[Regression toward the mean]] |
||
Examples of "laws" with a weaker foundation include: |
Examples of "laws" with a weaker foundation include: |
||
:*[[Safety in numbers]] |
:*[[Safety in numbers]] |
||
:*[[Benford's law]] |
|||
Examples of "laws" which are more general observations than having a theoretical background: |
Examples of "laws" which are more general observations than having a theoretical background: |
||
:*[[ |
:*[[Rank–size distribution]] |
||
Examples of supposed "laws" which are incorrect include: |
Examples of supposed "laws" which are incorrect include: |
||
:*[[Law of averages]] |
:*[[Law of averages]] |
||
==See also== |
==See also== |
||
:*[[probability axioms |
:*[[probability axioms|Laws of chance]] |
||
:*[[:Category: Statistical laws]] |
|||
:*[[Law (mathematics)]] |
|||
==Notes== |
|||
<references/> |
|||
==References== |
|||
*Kitcher, P., Salmon, W.C. (Editors) (2009) ''Scientific Explanation''. University of Minnesota Press. {{ISBN|978-0-8166-5765-0}} |
|||
*Gelbukh, A., Sidorov, G. (2008). Zipf and Heaps Laws’ Coefficients Depend on Language. In:''Computational Linguistics and Intelligent Text Processing'' (pp. 332–335), Springer. {{ISBN|978-3-540-41687-6}} . [https://doi.org/10.1007%2F3-540-44686-9_33 link to abstract] |
|||
[[Category:Statistical laws]] |
[[Category:Statistical laws]] |
Latest revision as of 03:59, 20 August 2024
An empirical statistical law or (in popular terminology) a law of statistics represents a type of behaviour that has been found across a number of datasets and, indeed, across a range of types of data sets.[1] Many of these observances have been formulated and proved as statistical or probabilistic theorems and the term "law" has been carried over to these theorems. There are other statistical and probabilistic theorems that also have "law" as a part of their names that have not obviously derived from empirical observations. However, both types of "law" may be considered instances of a scientific law in the field of statistics. What distinguishes an empirical statistical law from a formal statistical theorem is the way these patterns simply appear in natural distributions, without a prior theoretical reasoning about the data.
Examples
[edit]There are several such popular "laws of statistics".
The Pareto principle is a popular example of such a "law". It states that roughly 80% of the effects come from 20% of the causes, and is thus also known as the 80/20 rule.[2] In business, the 80/20 rule says that 80% of your business comes from just 20% of your customers.[3] In software engineering, it is often said that 80% of the errors are caused by just 20% of the bugs.[4] 20% of the world creates roughly 80% of worldwide GDP.[5] 80% of healthcare expenses in the US are caused by 20% of the population.[6]
Zipf's law, described as an "empirical statistical law" of linguistics,[7] is another example. According to the "law", given some dataset of text, the frequency of a word is inversely proportional to its frequency rank. In other words, the second most common word should appear about half as often as the most common word, and the fifth most common world would appear about once every five times the most common word appears. However, what sets Zipf's law as an "empirical statistical law" rather than just a theorem of linguistics is that it applies to phenomena outside of its field, too. For example, a ranked list of US metropolitan populations also follow Zipf's law,[8] and even forgetting follows Zipf's law.[9] This act of summarizing several natural data patterns with simple rules is a defining characteristic of these "empirical statistical laws".
Examples of empirically inspired statistical laws that have a firm theoretical basis include:
Examples of "laws" with a weaker foundation include:
Examples of "laws" which are more general observations than having a theoretical background:
Examples of supposed "laws" which are incorrect include:
See also
[edit]Notes
[edit]- ^ Kitcher & Salmon (2009) p.51
- ^ Bunkley, Nick (2008-03-03). "Joseph Juran, 103, Pioneer in Quality Control, Dies". The New York Times. ISSN 0362-4331. Retrieved 2017-05-05.
- ^ Staff, Investopedia (2010-11-04). "80-20 Rule". Investopedia. Retrieved 2017-05-05.
- ^ Rooney, Paula (2002-10-03). "Microsoft's CEO: 80-20 Rule Applies To Bugs, Not Just Features". CRN. Retrieved 2017-05-05.
- ^ 1992 Human Development Report. United Nations Development Program. New York: Oxford University Press. 1992.
{{cite book}}
: CS1 maint: others (link) - ^ "Chart 1: Percent of Total Health Care Expenses Incurred by Different Percentiles of U.S. Population: 2002". Research in Action, Issue 19. Rockville, MD: Agency for Healthcare Research and Quality. June 2006.
- ^ Gelbukh & Sidorov (2008)
- ^ Gabaix, Xavier (2011). "The Area and Population of Cities: New Insights from a Different Perspective on Cities" (PDF). American Economic Review. 101 (5): 2205–2225. arXiv:1001.5289. doi:10.1257/aer.101.5.2205. S2CID 4998367.
- ^ Anderson, John R.; Schooler, Lael J. (November 1991). "Reflections of the Environment in Memory" (PDF). Psychological Science. 2 (6): 396–408. doi:10.1111/j.1467-9280.1991.tb00174.x. S2CID 8511110.
References
[edit]- Kitcher, P., Salmon, W.C. (Editors) (2009) Scientific Explanation. University of Minnesota Press. ISBN 978-0-8166-5765-0
- Gelbukh, A., Sidorov, G. (2008). Zipf and Heaps Laws’ Coefficients Depend on Language. In:Computational Linguistics and Intelligent Text Processing (pp. 332–335), Springer. ISBN 978-3-540-41687-6 . link to abstract