Birthday problem: Difference between revisions
No edit summary Tags: Visual edit Mobile edit Mobile web edit |
|||
(877 intermediate revisions by more than 100 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Probability of shared birthdays}} |
|||
In [[probability theory]], the '''birthday problem''' or '''birthday [[paradox]]'''<ref>This is not a [[paradox]] in the sense of leading to a [[logic]]al contradiction, but is called a paradox because the mathematical truth contradicts naïve [[intuition (knowledge)|intuition]]: most people estimate that the chance of two individuals sharing the same birthday in a group of 23 is much lower than 50%.</ref> pertains to the [[probability]] that, in a set of ''n'' [[random]]ly chosen people, some pair of them will have the same [[birthday]]. By the [[pigeonhole principle]], the probability reaches 100% when the number of people reaches 366. However, 99% probability is reached with just 57 people, and 50% probability with 23 people. These conclusions are based on the assumption that each day of the year (except February 29) is equally probable for a birthday. |
|||
{{for multi|yearly variation in mortality rates|Birthday effect|the mathematical brain teaser that was asked in the Math Olympiad|Cheryl's Birthday}} |
|||
[[Image:Birthday Paradox.svg|thumb|upright=1.3|The computed probability of at least two people sharing the same birthday versus the number of people]] |
|||
The mathematics behind this problem led to a well-known cryptographic attack called the [[birthday attack]], which uses this probabilistic model to reduce the complexity of cracking a [[hash function]]. |
|||
In [[probability theory]], the '''birthday problem''' asks for the probability that, in a set of {{mvar|n}} [[random]]ly chosen people, at least two will share the same [[birthday]]. The '''birthday paradox''' refers to the counterintuitive fact that only 23 people are needed for that probability to exceed 50%. |
|||
[[Image:Birthday Paradox.svg|thumb|right|450px|A graph showing the approximate probability of at least two people sharing a birthday amongst a certain number of people.]] |
|||
The birthday paradox is a [[veridical paradox]]: it seems wrong at first glance but is, in fact, true. While it may seem surprising that only 23 individuals are required to reach a 50% probability of a shared birthday, this result is made more intuitive by considering that the birthday comparisons will be made between every possible pair of individuals. With 23 individuals, there are {{sfrac|23 × 22|2}} = 253 pairs to consider, more than half the 365 / 366 days in a calendar year. |
|||
== Understanding the problem == |
|||
Real-world applications for the birthday problem include a cryptographic attack called the [[birthday attack]], which uses this probabilistic model to reduce the complexity of finding a [[Collision attack|collision]] for a [[hash function]], as well as calculating the approximate risk of a hash collision existing within the hashes of a given size of population. |
|||
The birthday problem asks whether ''any'' of the people in a given group has a birthday matching ''any'' of the others — not one in particular. (See "[[#Same birthday as you|Same birthday as you]]" below for an analysis of this much less surprising alternative problem.) |
|||
The problem is generally attributed to [[Harold Davenport]] in about 1927, though he did not publish it at the time. Davenport did not claim to be its discoverer "because he could not believe that it had not been stated earlier".<ref>[[David Singmaster]], ''Sources in Recreational Mathematics: An Annotated Bibliography'', Eighth Preliminary Edition, 2004, [https://www.puzzlemuseum.com/singma/singma6/SOURCES/singma-sources-edn8-2004-03-19.htm#_Toc69534221 section 8.B]</ref><ref>[[H.S.M. Coxeter]], "Mathematical Recreations and Essays, 11th edition", 1940, p 45, as reported in [[I. J. Good]], ''Probability and the weighing of evidence'', 1950, [https://archive.org/details/probabilityweigh0000good/page/38/mode/2up?q=same%20birthday p. 38]</ref> The first publication of a version of the birthday problem was by [[Richard von Mises]] in 1939.<ref>Richard Von Mises, "Über Aufteilungs- und Besetzungswahrscheinlichkeiten", ''Revue de la faculté des sciences de l'Université d'Istanbul'' '''4''':145-163, 1939, reprinted in {{cite book |editor-first1 = P. |editor-last1 = Frank |editor-first2 = S. |editor-last2 = Goldstein |editor-first3 = M. |editor-last3 = Kac |editor-first4 = W. |editor-last4 = Prager |editor-first5 = G. |editor-last5 = Szegö |editor-first6 = G. |editor-last6 = Birkhoff |title = Selected Papers of Richard von Mises |volume = 2 | pages = 313–334 |date=1964 |publisher = Amer. Math. Soc. |location=Providence, Rhode Island}}</ref> |
|||
In the example given earlier, a list of 23 people, comparing the birthday of the first person on the list to the others allows 22 chances for a matching birthday (The second person on the list to the others allows 21 chances for a matching birthday, third person has 20 chances, and so on. Hence 22+21+20+....+1 = 253), but comparing every person to all of the others allows 253 distinct chances ([[combination]]s): in a group of 23 people there are <math>{25 \choose 2} = \frac{25 \cdot 22}{2} = 253</math> pairs. |
|||
==Calculating the probability== |
|||
Presuming all birthdays are equally probable,<ref>In reality, birthdays are not evenly distributed throughout the year; there are more births per day in some seasons than in others, but for the purposes of this problem the distribution is treated as uniform.{{fact|date=November 2011}}</ref> the probability of a given birthday for a person chosen from the entire population at random is 1/365 (ignoring [[February 29|Leap Day]], February 29). Although the pairings in a group of 23 people are not statistically equivalent to 253 pairs chosen independently, the birthday paradox becomes less surprising if a group is thought of in terms of the number of possible pairs, rather than as the number of individuals. |
|||
From a [[permutation|permutations]] perspective, let the event {{math|''A''}} be the probability of finding a group of 23 people without any repeated birthdays. Where the event {{math|''B''}} is the probability of finding a group of 23 people with at least two people sharing same birthday, {{math|''P''(''B'') {{=}} 1 − ''P''(''A'')}}. {{math|''P''(''A'')}} is the ratio of the total number of birthdays, <math>V_{nr}</math>, without repetitions and order matters (e.g. for a group of 2 people, mm/dd birthday format, one possible outcome is <math>\left \{ \left \{01/02,05/20\right \},\left \{05/20,01/02\right \},\left \{10/02,08/04\right\},...\right \}</math>) divided by the total number of birthdays with repetition and order matters, <math>V_{t}</math>, as it is the total space of outcomes from the experiment (e.g. 2 people, one possible outcome is <math>\left \{ \left \{01/02,01/02\right \},\left \{10/02,08/04\right \},...\right \}</math>). Therefore <math>V_{nr}</math> and <math>V_{t}</math> are [[permutation|permutations]]. |
|||
:<math>\begin{align} V_{nr} &= \frac{n!}{(n-k)!} = \frac{365!}{(365-23)!} \\[8pt] V_t &= n^k = 365^{23} \\[8pt] P(A) &= \frac{V_{nr}}{V_t} \approx 0.492703 \\[8pt] P(B) &= 1 - P(A) \approx 1 - 0.492703 \approx 0.507297 \quad (50.7297\%)\end{align}</math> |
|||
== Calculating the probability == |
|||
Another way the birthday problem can be solved is by asking for an approximate probability that in a group of {{mvar|n}} people at least two have the same birthday. For simplicity, [[leap year]]s, [[twin]]s, [[selection bias]], and seasonal and weekly variations in birth rates{{refn|see [[Birthday#Distribution through the year]]}} are generally disregarded, and instead it is assumed that there are 365 possible birthdays, and that each person's birthday is equally likely to be any of these days, independent of the other people in the group. |
|||
The problem is to compute the approximate probability that in a room of ''n'' people, at least two have the same birthday. For simplicity, disregard variations in the distribution, such as [[leap year]]s, [[twin]]s, seasonal or weekday variations, and assume that the 365 possible birthdays are equally likely. Real-life birthday distributions are not uniform since not all dates are equally likely.<ref>In particular, many children are born in the summer, especially the months of August and September (for the northern hemisphere) [http://scienceworld.wolfram.com/astronomy/LeapDay.html], and in the U.S. it has been noted that many children are conceived around the holidays of [[Christmas]] and [[New Year's Day]]{{Citation needed|date=August 2009}}. Also, because hospitals rarely schedule C-sections and induced labor on the weekend, more Americans are born on Mondays and Tuesdays than on weekends{{Citation needed|date=August 2009}}; where many of the people share a birth year (e.g. a class in a school), this creates a tendency toward particular dates. In Sweden 9.3% of the population is born in March and 7.3% in November when a uniform distribution would give 8.3% [http://www.scb.se/statistik/BE/BE0101/2006A01a/BE0101_2006A01a_SM_BE12SM0701.pdf Swedish statistics board] Both of these factors tend to increase the chance of identical birth dates, since a denser subset has more possible pairs (in the extreme case when everyone was born on three days, there would obviously be many identical birthdays). The birthday problem for such non-constant birthday probabilities was first understood by [[Murray Klamkin]] in 1967. A formal proof that the probability of two matching birthdays is least for a uniform distribution of birthdays was given by D. Bloom (1973)</ref> |
|||
For independent birthdays, a [[Discrete_uniform_distribution | uniform distribution]] of birthdays minimizes the probability of two people in a group having the same birthday. Any unevenness increases the likelihood of two people sharing a birthday.<ref>{{Harv|Bloom|1973}}</ref><ref>{{cite book |author-first = J. Michael |author-last = Steele |title = The Cauchy‑Schwarz Master Class |url = https://archive.org/details/cauchyschwarzmas00stee_431 |url-access = limited | pages = [https://archive.org/details/cauchyschwarzmas00stee_431/page/n217 206], 277 |date=2004 |publisher = Cambridge University Press |location=Cambridge | isbn = 9780521546775 }}</ref> However real-world birthdays are not sufficiently uneven to make much change: the real-world group size necessary to have a greater than 50% chance of a shared birthday is 23, as in the theoretical uniform distribution.<ref name="Borja">{{cite journal |author1 = Mario Cortina Borja |author2 = John Haigh |title = The Birthday Problem |journal = Significance |date = September 2007 |volume = 4 |issue = 3 |pages = 124–127 |publisher = Royal Statistical Society |doi = 10.1111/j.1740-9713.2007.00246.x|doi-access = free }}</ref> |
|||
If ''P''(''A'') is the probability of at least two people in the room having the same birthday, it may be simpler to calculate ''P''(''A''<nowiki>'</nowiki>), the probability of there not being any two people having the same birthday. Then, because ''P''(''A'') and ''P''(''A''<nowiki>'</nowiki>) are the only two possibilities and are also [[mutually exclusive events|mutually exclusive]], ''P''(''A''<nowiki>'</nowiki>) = 1 − ''P''(''A''). |
|||
The goal is to compute {{math|''P''(''B'')}}, the probability that at least two people in the room have the same birthday. However, it is simpler to calculate {{math|''P''(''A''′)}}, the probability that no two people in the room have the same birthday. Then, because {{math|''B''}} and {{math|''A''′}} are the only two possibilities and are also [[mutually exclusive events|mutually exclusive]], {{math|''P''(''B'') {{=}} 1 − ''P''(''A''′).}} |
|||
In deference to widely published solutions concluding that 25 is the number of people necessary to have a P(A) that is greater than 50%, the following calculation of P(A) will use 25 people as an example. |
|||
Here is the calculation of {{math|''P''(''B'')}} for 23 people. Let the 23 people be numbered 1 to 23. The [[event (probability theory)|event]] that all 23 people have different birthdays is the same as the event that person 2 does not have the same birthday as person 1, and that person 3 does not have the same birthday as either person 1 or person 2, and so on, and finally that person 23 does not have the same birthday as any of persons 1 through 22. Let these events be called Event 2, Event 3, and so on. Event 1 is the event of person 1 having a birthday, which occurs with probability 1. This conjunction of events may be computed using [[conditional probability]]: the probability of Event 2 is {{sfrac|364|365}}, as person 2 may have any birthday other than the birthday of person 1. Similarly, the probability of Event 3 given that Event 2 occurred is {{sfrac|363|365}}, as person 3 may have any of the birthdays not already taken by persons 1 and 2. This continues until finally the probability of Event 23 given that all preceding events occurred is {{sfrac|343|365}}. Finally, the principle of conditional probability implies that {{math|''P''(''A''′)}} is equal to the product of these individual probabilities: |
|||
When events are [[independence (probability theory)|independent]] of each other, the probability of all of the events occurring is equal to a product of the probabilities of each of the events occurring. Therefore, if P(A') can be described as 23 independent events, ''P''(''A''<nowiki>'</nowiki>) could be calculated as ''P''(1) × P(2) × P(3) × ... × ''P''(25). |
|||
{{NumBlk|:|<math>P(A')=\frac{365}{365}\times\frac{364}{365}\times\frac{363}{365}\times\frac{362}{365}\times\cdots\times\frac{343}{365}</math>|{{EquationRef|1}}}} |
|||
The terms of equation ({{EquationNote|1}}) can be collected to arrive at: |
|||
The 23 independent events correspond to the 23 people, and can be defined in order. Each event can be defined as the corresponding person not sharing their birthday with any of the previously analyzed people. For Event 1, there are no previously analyzed people. Therefore, the probability, ''P''(1), that person number 1 does not share his/her birthday with previously analyzed people is 1, or 100%. Ignoring leap years for this analysis, the probability of 1 can also be written as 365/365, for reasons that will become clear below. |
|||
{{NumBlk|:|<math>P(A')=\left(\frac{1}{365}\right)^{23}\times(365\times364\times363\times\cdots\times343)</math>|{{EquationRef|2}}}} |
|||
Evaluating equation ({{EquationNote|2}}) gives {{math|''P''(''A''′) ≈ 0.492703}} |
|||
For Event 2, the only previously analyzed people are Person 1. Assuming that birthdays are equally likely to happen on each of the 365 days of the year, the probability, ''P''(2), that Person 2 has a different birthday than Person 1 is 364/365. This is because, if Person 2 was born on any of the other 364 days of the year, Persons 1 and 2 will not share the same birthday. |
|||
Therefore, {{math|''P''(''B'') ≈ 1 − 0.492703 {{=}} 0.507297}} (50.7297%). |
|||
Similarly, if Person 3 is born on any of the 363 days of the year other than the birthdays of Persons 1 and 2, Person 3 will not share their birthday. This makes the probability ''P''(3) = 363/365. |
|||
This process can be generalized to a group of {{mvar|n}} people, where {{math|''p''(''n'')}} is the probability of at least two of the {{mvar|n}} people sharing a birthday. It is easier to first calculate the probability {{math|''{{overline|p}}''(''n'')}} that all {{mvar|n}} birthdays are ''different''. According to the [[pigeonhole principle]], {{math|''{{overline|p}}''(''n'')}} is zero when {{math|''n'' > 365}}. When {{math|''n'' ≤ 365}}: |
|||
This analysis continues until Person 25 is reached, whose probability of not sharing their birthday with people analyzed before, ''P''(25), is 343/365. |
|||
:<math> \begin{align} \bar p(n) &= 1 \times \left(1-\frac{1}{365}\right) \times \left(1-\frac{2}{365}\right) \times \cdots \times \left(1-\frac{n-1}{365}\right) \\[6pt] &= \frac{ 365 \times 364 \times \cdots \times (365-n+1) }{ 365^n } \\[6pt] &= \frac{ 365! }{ 365^n (365-n)!} = \frac{n!\cdot\binom{365}{n}}{365^n} = \frac{_{365}P_n}{365^n}\end{align} </math> |
|||
''P''(''A''<nowiki>'</nowiki>) is equal to the product of these individual probabilities: |
|||
where {{math|!}} is the [[factorial]] operator, {{math|{{pars|s=160%|{{su|p=365|b=''n''|a=c}}}}}} is the [[binomial coefficient]] and {{math|''<sub>k</sub>P<sub>r</sub>''}} denotes [[permutation]]. |
|||
: (1) ''P''(''A''<nowiki>'</nowiki>) = 365/365 × 364/365 × 363/365 × 362/365 × ... × 343/365 |
|||
The equation expresses the fact that the first person has no one to share a birthday, the second person cannot have the same birthday as the first {{math|{{pars|s=160%|{{sfrac|364|365}}}}}}, the third cannot have the same birthday as either of the first two {{math|{{pars|s=160%|{{sfrac|363|365}}}}}}, and in general the {{mvar|n}}th birthday cannot be the same as any of the {{math|''n'' − 1}} preceding birthdays. |
|||
The terms of equation (1) can be collected to arrive at: |
|||
The [[event (probability theory)|event]] of at least two of the {{mvar|n}} persons having the same birthday is [[complementary event|complementary]] to all {{mvar|n}} birthdays being different. Therefore, its probability {{math|''p''(''n'')}} is |
|||
: (2) ''P''(''A''<nowiki>'</nowiki>) = (1/365)<sup>23</sup> × (365 × 364 × 363 × ... × 343) |
|||
:<math> p(n) = 1 - \bar p(n). </math> |
|||
Evaluating equation (2) gives ''P''(''A''<nowiki>'</nowiki>) = 0.492703 |
|||
The following table shows the probability for some other values of {{mvar|n}} (for this table, the existence of leap years is ignored, and each birthday is assumed to be equally likely): |
|||
Therefore, ''P''(''A'') = 1 − 0.492703 = 0.507297 (50.7297%) |
|||
[[Image:Birthdaymatch.svg|thumb|right|upright=1.4|The probability that no two people share a birthday in a group of {{mvar|n}} people. Note that the vertical scale is logarithmic (each step down is 10<sup>20</sup> times less likely).]] |
|||
This process can be generalized to a group of ''n'' people, where ''p''(''n'') is the probability of at least two of the ''n'' people sharing a birthday. It is easier to first calculate the probability ''<u style="text-decoration:overline">p</u>''(''n'') that all ''n'' birthdays are ''different''. According to the [[pigeonhole principle]], ''<u style="text-decoration:overline">p</u>''(''n'') is zero when ''n'' > 365. When ''n'' ≤ 365: |
|||
:{| class="wikitable" |
|||
:<math> \begin{align} \bar p(n) &= 1 \times \left(1-\frac{1}{365}\right) \times \left(1-\frac{2}{365}\right) \times \cdots \times \left(1-\frac{n-1}{365}\right) \\ &= { 365 \times 364 \times \cdots \times (365-n+1) \over 365^n } \\ &= { 365! \over 365^n (365-n)!} = \frac{n!\cdot{365 \choose n}}{365^n}\end{align} </math> |
|||
!{{mvar|n}}!!{{math|''p''(''n'')}} |
|||
|- |
|||
where ' ! ' is the [[factorial]] operator. |
|||
|align=right|1 || {{0}}0.0% |
|||
|- |
|||
The equation expresses the fact that for no persons to share a birthday, a second person cannot have the same birthday as the first (364/365), the third cannot have the same birthday as the first two (363/365), and in general the ''n''<sup>th</sup> birthday cannot be the same as any of the ''n'' − 1 preceding birthdays. |
|||
|align=right|5 || {{0}}2.7% |
|||
|- |
|||
The [[event (probability theory)|event]] of at least two of the ''n'' persons having the same birthday is [[complementary event|complementary]] to all ''n'' birthdays being different. Therefore, its probability ''p''(''n'') is |
|||
|align=right|10 || 11.7% |
|||
|- |
|||
:<math> p(n) = 1 - \bar p(n). \, </math> |
|||
|align=right|20 || 41.1% |
|||
|- |
|||
[[Image:Birthdaymatch.png|thumb|right|450px|The approximate probability that no two people share a birthday in a group of ''n'' people. Note that the vertical scale is logarithmic (each step down is 10<sup>20</sup> times less likely).]] |
|||
|align=right|23 || 50.7% |
|||
This probability surpasses 1/2 for ''n'' = 23 (with value about 50.7%). The following table shows the probability for some other values of ''n'' (this table ignores the existence of leap years, as described above): |
|||
{| class="wikitable" |
|||
!''n''!!''p''(''n'') |
|||
|- |
|- |
||
| |
|align=right|30 || 70.6% |
||
|- |
|- |
||
| |
|align=right|40 || 89.1% |
||
|- |
|- |
||
| |
|align=right|50 || 97.0% |
||
|- |
|- |
||
| |
|align=right|60 || 99.4% |
||
|- |
|- |
||
| |
|align=right|70 || 99.9% |
||
|- |
|- |
||
| |
|align=right|75 || 99.97% |
||
|- |
|- |
||
|100 || 99.99997% |
|align=right|100 || {{val|99.99997}}% |
||
|- |
|- |
||
|200 || 99.9999999999999999999999999998% |
|align=right|200 || {{val|99.9999999999999999999999999998}}% |
||
|- |
|- |
||
|300 || (100 |
|align=right|300 || (100 − {{val|6|e=-80}})% |
||
|- |
|- |
||
|350 || (100 |
|align=right|350 || (100 − {{val|3|e=-129}})% |
||
|- |
|- |
||
|365 || (100 |
|align=right|365 || (100 − {{val|1.45|e=-155}})% |
||
|- |
|- |
||
|align=right|<!-- per assumptions in text -->≥ 366<!-- leap years are ignored --> || 100% |
|||
|366 || 100% |
|||
|} |
|} |
||
== |
==Approximations== |
||
[[Image:Birthday paradox probability.svg|thumb|right|upright=1.4|Graphs showing the approximate probabilities of at least two people sharing a birthday ({{color|red|red}}) and its complementary event ({{color|blue|blue}})]] |
|||
The [[Taylor series]] expansion of the [[exponential function]] (the constant e = 2.718281828, approximately) |
|||
[[Image:Birthday paradox approximation.svg|thumb|right|upright=1.4|A graph showing the accuracy of the approximation 1 − ''e''<sup>−''n''<sup>2</sup>/730</sup> ({{color|red|red}})]] |
|||
The [[Taylor series]] expansion of the [[exponential function]] (the constant {{math|''e'' ≈ {{val|2.718281828}}}}) |
|||
:<math> e^x = 1 + x + \frac{x^2}{2!}+\cdots </math> |
:<math> e^x = 1 + x + \frac{x^2}{2!}+\cdots </math> |
||
provides a first-order approximation for {{math|''e''<sup>''x''</sup>}} for <math>|x| \ll 1</math>: |
|||
[[Image:050329-birthday2.png|thumb|right|290px|A graph showing the accuracy of the approximation <math>1-e^{-n^2/(2 \times 365)}</math>]] |
|||
:<math> e^x \approx 1 + x.</math> |
|||
provides a first-order approximation for ''e''<sup>''x''</sup> for x << 1: |
|||
To apply this approximation to the first expression derived for {{math|''{{overline|p}}''(''n'')}}, set {{math|''x'' {{=}} −{{sfrac|''a''|365}}}}. Thus, |
|||
:<math> e^x \approx 1 + x.\ </math> |
|||
:<math> e^{-a/365} \approx 1 - \frac{a}{365}. </math> |
|||
Then, replace {{mvar|a}} with non-negative integers for each term in the formula of {{math|''{{overline|p}}''(''n'')}} until {{math|''a'' {{=}} ''n'' − 1}}, for example, when {{math|''a'' {{=}} 1}}, |
|||
The first expression derived for ''<u style="text-decoration:overline">p</u>''(''n'') can be approximated as |
|||
: <math> e^{-1/365} \approx 1 - \frac{1}{365}. </math> |
|||
The first expression derived for {{math|''{{overline|p}}''(''n'')}} can be approximated as |
|||
:<math> |
:<math> |
||
\begin{align} |
\begin{align} |
||
\bar p(n) & \approx 1 \ |
\bar p(n) & \approx 1 \cdot e^{-1/365} \cdot e^{-2/365} \cdots e^{-(n-1)/365} \\[6pt] |
||
& = |
& = e^{-\big(1+2+ \,\cdots\, +(n-1)\big)/365} \\[6pt] |
||
& = e^{- |
& = e^{-\frac{n(n-1)/2}{365}} = e^{-\frac{n(n-1)}{730}}. |
||
\end{align} |
\end{align} |
||
</math> |
</math> |
||
Line 110: | Line 116: | ||
Therefore, |
Therefore, |
||
:<math> p(n) = 1-\bar p(n) \approx 1 - e^{- |
:<math> p(n) = 1-\bar p(n) \approx 1 - e^{-\frac{n(n-1)}{730}}.</math> |
||
An even coarser approximation is given by |
An even coarser approximation is given by |
||
:<math>p(n)\approx 1-e^{- |
:<math>p(n)\approx 1-e^{-\frac{n^2}{730}},</math> |
||
which, as the graph illustrates, is still fairly accurate. |
which, as the graph illustrates, is still fairly accurate. |
||
According to the approximation, the same approach can be applied to any number of "people" and "days". If rather than 365 days there are {{mvar|d}}, if there are {{mvar|n}} persons, and if {{math|''n'' ≪ ''d''}}, then using the same approach as above we achieve the result that if {{math|''p''(''n'', ''d'')}} is the probability that at least two out of {{mvar|n}} people share the same birthday from a set of {{mvar|d}} available days, then: |
|||
=== A simple exponentiation === |
|||
The probability of any two people not having the same birthday is 364/365. In a room containing ''n'' people, there are ''C''(''n'', 2)=''n(n-1)/2'' pairs of people, i.e. ''C''(''n'', 2) events. The probability of no two people sharing the same birthday can be approximated by assuming that these events are independent and hence by multiplying their probability together. In short 364/365 can be multiplied by itself ''C''(''n'', 2) times, which gives us |
|||
:<math>\begin{align} |
|||
:<math>\left(\frac{364}{365}\right)^{C(n,2)}.</math> |
|||
p(n, d) & \approx 1-e^{-\frac{n(n-1)}{2d}} \\[6pt] |
|||
& \approx 1-e^{-\frac{n^2}{2d}}. |
|||
\end{align}</math> |
|||
===Simple exponentiation=== |
|||
And if this is the probability of no one having the same birthday, then the probability of someone sharing a birthday is |
|||
The probability of any two people not having the same birthday is {{sfrac|364|365}}. In a room containing ''n'' people, there are {{math|{{pars|s=150%|{{su|p=''n''|b=2|a=c}}}} {{=}} {{sfrac|''n''(''n'' − 1)|2}}}} pairs of people, i.e. {{math|{{pars|s=150%|{{su|p=''n''|b=2|a=c}}}}}} events. The probability of no two people sharing the same birthday can be approximated by assuming that these events are independent and hence by multiplying their probability together. Being independent would be equivalent to picking [[Sampling (statistics)#Replacement of selected units|with replacement]], any pair of people in the world, not just in a room. In short {{sfrac|364|365}} can be multiplied by itself {{math|{{pars|s=150%|{{su|p=''n''|b=2|a=c}}}}}} times, which gives us |
|||
:<math>p(n) \approx |
:<math>\bar p(n) \approx \left(\frac{364}{365}\right)^\binom{n}{2}.</math> |
||
Since this is the probability of no one having the same birthday, then the probability of someone sharing a birthday is |
|||
=== Poisson approximation === |
|||
:<math>p(n) \approx 1 - \left(\frac{364}{365}\right)^\binom{n}{2}.</math> |
|||
Using the [[Poisson distribution|Poisson]] approximation for the binomial, |
|||
And for the group of 23 people, the probability of sharing is |
|||
:<math>\mathrm{Poi}\left(\frac{C(23, 2)}{365}\right) =\mathrm{Poi}\left(\frac{253}{365}\right) \approx \mathrm{Poi}(0.6932)</math> |
|||
:<math>p(23) \approx 1 - \left(\frac{364}{365}\right)^\binom{23}{2} = 1 - \left(\frac{364}{365}\right)^{253} \approx 0.500477 .</math> |
|||
:<math>\Pr(X>0)=1-\Pr(X=0)=1-e^{-0.6932}=1-0.499998=0.500002.</math> |
|||
===Poisson approximation=== |
|||
Again, this is over 50%. |
|||
Applying the [[Poisson distribution|Poisson]] approximation for the binomial on the group of 23 people, |
|||
:<math>\operatorname{Poi}\left(\frac{\binom{23}{2}}{365}\right) =\operatorname{Poi}\left(\frac{253}{365}\right) \approx \operatorname{Poi}(0.6932)</math> |
|||
=== Approximation of number of people === |
|||
This can also be approximated using the following formula for the ''number'' of people necessary to have at least a 50% chance of matching: |
|||
so |
|||
:<math>n \approx \frac{1}{2} + \sqrt{\frac{1}{4} - 2 \times 365 \times \ln(0.5)} = 22.999943.</math> |
|||
This is a result of the good approximation that an event with 1 in ''k'' probability will have a 50% chance of occurring at least once if it is repeated ''k'' ln 2 times.<ref>{{cite journal |
|||
:<math>\Pr(X>0)=1-\Pr(X=0) \approx 1-e^{-0.6932} \approx 1-0.499998=0.500002.</math> |
|||
| quotes = |
|||
The result is over 50% as previous descriptions. This approximation is the same as the one above based on the Taylor expansion that uses {{math|''e<sup>x</sup>'' ≈ 1 + ''x''}}. |
|||
===Square approximation=== |
|||
A good [[rule of thumb]] which can be used for [[mental calculation]] is the relation |
|||
:<math>p(n,d) \approx \frac{n^2}{2d}</math> |
|||
which can also be written as |
|||
:<math>n \approx \sqrt { 2d \times p(n)}</math> |
|||
which works well for probabilities less than or equal to {{sfrac|1|2}}. In these equations, {{mvar|d}} is the number of days in a year. |
|||
For instance, to estimate the number of people required for a {{sfrac|1|2}} chance of a shared birthday, we get |
|||
:<math>n \approx \sqrt{ 2 \times 365 \times \tfrac12} = \sqrt{365} \approx 19</math> |
|||
Which is not too far from the correct answer of 23. |
|||
===Approximation of number of people=== |
|||
This can also be approximated using the following formula for the ''number'' of people necessary to have at least a {{sfrac|1|2}} chance of matching: |
|||
:<math>n \geq \tfrac{1}{2} + \sqrt{\tfrac{1}{4} + 2 \times \ln(2) \times 365} = 22.999943.</math> |
|||
This is a result of the good approximation that an event with {{math|{{sfrac|1|''k''}}}} probability will have a {{sfrac|1|2}} chance of occurring at least once if it is repeated {{math|''k'' [[natural logarithm of 2|ln 2]]}} times.<ref>{{cite journal |
|||
| last = Mathis |
| last = Mathis |
||
| first = Frank H. |
| first = Frank H. |
||
|date= June 1991 |
|||
| coauthors = |
|||
| date = |
|||
| year = 1991 |
|||
| month = June |
|||
| title = A Generalized Birthday Problem |
| title = A Generalized Birthday Problem |
||
| journal = SIAM Review |
| journal = SIAM Review |
||
Line 153: | Line 184: | ||
| issue = 2 |
| issue = 2 |
||
| pages = 265–270 |
| pages = 265–270 |
||
| publisher = [[Society for Industrial and Applied Mathematics]] |
|||
| issn = 0036-1445 |
| issn = 0036-1445 |
||
| pmc = |
|||
| doi = 10.1137/1033051 |
| doi = 10.1137/1033051 |
||
| bibcode = |
|||
| oclc = 37699182 |
| oclc = 37699182 |
||
| jstor = 2031144 |
| jstor = 2031144 |
||
| url = http://http.cs.berkeley.edu/~daw/papers/genbday-crypto02.ps |
|||
}}</ref> |
}}</ref> |
||
=== |
===Probability table=== |
||
{{Main|Birthday attack}} |
{{Main|Birthday attack}} |
||
:{| class="wikitable" style="white-space:nowrap;" |
:{| class="wikitable" style="white-space:nowrap;" |
||
|- |
|- |
||
! rowspan="2 |
! rowspan="2" | length of <br />hex string |
||
! rowspan="2" |
! rowspan="2" | no. of<br />bits<br />({{mvar|b}}) |
||
! rowspan="2 |
! rowspan="2" | hash space<br />size<br />({{math|2<sup>''b''</sup>}}) |
||
! colspan="10 |
! colspan="10" | Number of hashed elements such that probability of at least one hash collision ≥ {{mvar|p}} |
||
|- |
|- |
||
! {{mvar|p}} = {{val||e=-18}} |
|||
! style="background:lightgrey;" | ''p'' = 10<sup>−18</sup> |
|||
! {{mvar|p}} = {{val||e=-15}} |
|||
! style="background:lightgrey;" | ''p'' = 10<sup>−15</sup> |
|||
! {{mvar|p}} = {{val||e=-12}} |
|||
! style="background:lightgrey;" | ''p'' = 10<sup>−12</sup> |
|||
! {{mvar|p}} = {{val||e=-9}} |
|||
! style="background:lightgrey;" | ''p'' = 10<sup>−9</sup> |
|||
! {{mvar|p}} = {{val||e=-6}} |
|||
! style="background:lightgrey;" | ''p'' = 10<sup>−6</sup> |
|||
! {{mvar|p}} = 0.001 |
|||
! style="background:lightgrey;" | ''p'' = 0.1% |
|||
! {{mvar|p}} = 0.01 |
|||
! style="background:lightgrey;" | ''p'' = 1% |
|||
! {{mvar|p}} = 0.25 |
|||
! style="background:lightgrey;" | ''p'' = 25% |
|||
! {{mvar|p}} = 0.50 |
|||
! style="background:lightgrey;" | ''p'' = 50% |
|||
! {{mvar|p}} = 0.75 |
|||
! style="background:lightgrey;" | ''p'' = 75% |
|||
|- align="center" |
|- align="center" |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | 8 |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | 32 |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | {{val|4.3|e=9}} |
||
| 2 |
| 2 |
||
| 2 |
| 2 |
||
Line 191: | Line 220: | ||
| 2.9 |
| 2.9 |
||
| 93 |
| 93 |
||
| 2.9 |
| {{val|2.9|e=3}} |
||
| 9.3 |
| {{val|9.3|e=3}} |
||
| 5.0 |
| {{val|5.0|e=4}} |
||
| 7.7 |
| {{val|7.7|e=4}} |
||
| 1.1 |
| {{val|1.1|e=5}} |
||
|- align="center" |
|- align="center" |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | (10) |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | (40) |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | ({{val|1.1|e=12}}) |
||
| 2 |
|||
| 2 |
|||
| 2 |
|||
| 47 |
|||
| {{val|1.5|e=3}} |
|||
| {{val|4.7|e=4}} |
|||
| {{val|1.5|e=5}} |
|||
| {{val|8.0|e=5}} |
|||
| {{val|1.2|e=6}} |
|||
| {{val|1.7|e=6}} |
|||
|- align="center" |
|||
| bgcolor="#F2F2F2" | (12) |
|||
| bgcolor="#F2F2F2" | (48) |
|||
| bgcolor="#F2F2F2" | ({{val|2.8|e=14}}) |
|||
| 2 |
|||
| 2 |
|||
| 24 |
|||
| {{val|7.5|e=2}} |
|||
| {{val|2.4|e=4}} |
|||
| {{val|7.5|e=5}} |
|||
| {{val|2.4|e=6}} |
|||
| {{val|1.3|e=7}} |
|||
| {{val|2.0|e=7}} |
|||
| {{val|2.8|e=7}} |
|||
|- align="center" |
|||
| bgcolor="#F2F2F2" | 16 |
|||
| bgcolor="#F2F2F2" | 64 |
|||
| bgcolor="#F2F2F2" | {{val|1.8|e=19}} |
|||
| 6.1 |
| 6.1 |
||
| 1.9 |
| {{val|1.9|e=2}} |
||
| 6.1 |
| {{val|6.1|e=3}} |
||
| 1.9 |
| {{val|1.9|e=5}} |
||
| 6.1 |
| {{val|6.1|e=6}} |
||
| 1.9 |
| {{val|1.9|e=8}} |
||
| 6.1 |
| {{val|6.1|e=8}} |
||
| 3.3 |
| {{val|3.3|e=9}} |
||
| 5.1 |
| {{val|5.1|e=9}} |
||
| 7.2 |
| {{val|7.2|e=9}} |
||
|- align="center" |
|- align="center" |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | (24) |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | (96) |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | ({{val|7.9|e=28}}) |
||
| {{val|4.0|e=5}} |
|||
| 2.6 × 10<sup>10</sup> |
|||
| {{val|1.3|e=7}} |
|||
| 8.2 × 10<sup>11</sup> |
|||
| {{val|4.0|e=8}} |
|||
| 2.6 × 10<sup>13</sup> |
|||
| {{val|1.3|e=10}} |
|||
| 8.2 × 10<sup>14</sup> |
|||
| {{val|4.0|e=11}} |
|||
| 2.6 × 10<sup>16</sup> |
|||
| {{val|1.3|e=13}} |
|||
| 8.3 × 10<sup>17</sup> |
|||
| {{val|4.0|e=13}} |
|||
| 2.6 × 10<sup>18</sup> |
|||
| {{val|2.1|e=14}} |
|||
| 1.4 × 10<sup>19</sup> |
|||
| {{val|3.3|e=14}} |
|||
| 2.2 × 10<sup>19</sup> |
|||
| {{val|4.7|e=14}} |
|||
| 3.1 × 10<sup>19</sup> |
|||
|- align="center" |
|- align="center" |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | 32 |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | 128 |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | {{val|3.4|e=38}} |
||
| {{val|2.6|e=10}} |
|||
| 4.8 × 10<sup>29</sup> |
|||
| {{val|8.2|e=11}} |
|||
| 1.5 × 10<sup>31</sup> |
|||
| {{val|2.6|e=13}} |
|||
| 4.8 × 10<sup>32</sup> |
|||
| {{val|8.2|e=14}} |
|||
| 1.5 × 10<sup>34</sup> |
|||
| {{val|2.6|e=16}} |
|||
| 4.8 × 10<sup>35</sup> |
|||
| {{val|8.3|e=17}} |
|||
| 1.5 × 10<sup>37</sup> |
|||
| {{val|2.6|e=18}} |
|||
| 4.8 × 10<sup>37</sup> |
|||
| {{val|1.4|e=19}} |
|||
| 2.6 × 10<sup>38</sup> |
|||
| {{val|2.2|e=19}} |
|||
| 4.0 × 10<sup>38</sup> |
|||
| {{val|3.1|e=19}} |
|||
| 5.7 × 10<sup>38</sup> |
|||
|- align="center" |
|- align="center" |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | (48) |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | (192) |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | ({{val|6.3|e=57}}) |
||
| {{val|1.1|e=20}} |
|||
| 8.9 × 10<sup>48</sup> |
|||
| {{val|3.5|e=21}} |
|||
| 2.8 × 10<sup>50</sup> |
|||
| {{val|1.1|e=23}} |
|||
| 8.9 × 10<sup>51</sup> |
|||
| {{val|3.5|e=24}} |
|||
| 2.8 × 10<sup>53</sup> |
|||
| {{val|1.1|e=26}} |
|||
| 8.9 × 10<sup>54</sup> |
|||
| {{val|3.5|e=27}} |
|||
| 2.8 × 10<sup>56</sup> |
|||
| {{val|1.1|e=28}} |
|||
| 8.9 × 10<sup>56</sup> |
|||
| {{val|6.0|e=28}} |
|||
| 4.8 × 10<sup>57</sup> |
|||
| {{val|9.3|e=28}} |
|||
| 7.4 × 10<sup>57</sup> |
|||
| {{val|1.3|e=29}} |
|||
| 1.0 × 10<sup>58</sup> |
|||
|- align="center" |
|- align="center" |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | 64 |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | 256 |
||
| bgcolor=" |
| bgcolor="#F2F2F2" | {{val|1.2|e=77}} |
||
| {{val|4.8|e=29}} |
|||
| 1.6 × 10<sup>68</sup> |
|||
| {{val|1.5|e=31}} |
|||
| 5.2 × 10<sup>69</sup> |
|||
| {{val|4.8|e=32}} |
|||
| 1.6 × 10<sup>71</sup> |
|||
| {{val|1.5|e=34}} |
|||
| 5.2 × 10<sup>72</sup> |
|||
| {{val|4.8|e=35}} |
|||
| 1.6 × 10<sup>74</sup> |
|||
| {{val|1.5|e=37}} |
|||
| 5.2 × 10<sup>75</sup> |
|||
| {{val|4.8|e=37}} |
|||
| 1.6 × 10<sup>76</sup> |
|||
| {{val|2.6|e=38}} |
|||
| 8.8 × 10<sup>76</sup> |
|||
| {{val|4.0|e=38}} |
|||
| 1.4 × 10<sup>77</sup> |
|||
| {{val|5.7|e=38}} |
|||
| 1.9 × 10<sup>77</sup> |
|||
|- align="center" |
|||
| bgcolor="#F2F2F2" | (96) |
|||
| bgcolor="#F2F2F2" | (384) |
|||
| bgcolor="#F2F2F2" | ({{val|3.9|e=115}}) |
|||
| {{val|8.9|e=48}} |
|||
| {{val|2.8|e=50}} |
|||
| {{val|8.9|e=51}} |
|||
| {{val|2.8|e=53}} |
|||
| {{val|8.9|e=54}} |
|||
| {{val|2.8|e=56}} |
|||
| {{val|8.9|e=56}} |
|||
| {{val|4.8|e=57}} |
|||
| {{val|7.4|e=57}} |
|||
| {{val|1.0|e=58}} |
|||
|- align="center" |
|||
| bgcolor="#F2F2F2" | 128 |
|||
| bgcolor="#F2F2F2" | 512 |
|||
| bgcolor="#F2F2F2" | {{val|1.3|e=154}} |
|||
| {{val|1.6|e=68}} |
|||
| {{val|5.2|e=69}} |
|||
| {{val|1.6|e=71}} |
|||
| {{val|5.2|e=72}} |
|||
| {{val|1.6|e=74}} |
|||
| {{val|5.2|e=75}} |
|||
| {{val|1.6|e=76}} |
|||
| {{val|8.8|e=76}} |
|||
| {{val|1.4|e=77}} |
|||
| {{val|1.9|e=77}} |
|||
|} |
|} |
||
[[File:birthday_attack_vs_paradox.svg|thumb|Comparison of the birthday problem (1) and birthday attack (2):{{parabreak}} |
|||
:''The white squares in this table show the number of hashes needed to achieve the given probability of collision (column) given a hashspace of a certain size in bits (row). (Using the birthday analogy: the "hash space size"(row) would be "365 days", the "probability of collision"(column) would be "50%", and the "required number of people" would be "26"(row-col intersection).) One could of course also use this chart to determine the minimum hash size required (given upper bounds on the hashes and probability of error), or the probability of collision (for fixed number of hashes and probability of error).<br/>For comparison, 10<sup>−18</sup> to 10<sup>−15</sup> is the uncorrectable bit error rate of a typical hard disk [http://arxiv.org/abs/cs/0701166]. In theory, [[MD5]], 128 bits, should stay within that range until about 820 billion documents, even if its possible outputs are many more.'' |
|||
In (1), collisions are found within one set, in this case, 3 out of 276 pairings of the 24 lunar astronauts.{{parabreak}} |
|||
In (2), collisions are found between two sets, in this case, 1 out of 256 pairings of only the first bytes of SHA-256 hashes of 16 variants each of benign and harmful contracts.]] |
|||
The lighter fields in this table show the number of hashes needed to achieve the given probability of collision (column) given a hash space of a certain size in bits (row). Using the birthday analogy: the "hash space size" resembles the "available days", the "probability of collision" resembles the "probability of shared birthday", and the "required number of hashed elements" resembles the "required number of people in a group". One could also use this chart to determine the minimum hash size required (given upper bounds on the hashes and probability of error), or the probability of collision (for fixed number of hashes and probability of error). |
|||
For comparison, {{val|e=-18}} to {{val|e=-15}} is the uncorrectable bit error rate of a typical hard disk.<ref>Jim Gray, Catharine van Ingen. [https://arxiv.org/abs/cs/0701166 Empirical Measurements of Disk Failure Rates and Error Rates]</ref> In theory, 128-bit hash functions, such as [[MD5]], should stay within that range until about {{val|8.2|e=11}} documents, even if its possible outputs are many more. |
|||
==An upper bound== |
|||
==An upper bound on the probability and a lower bound on the number of people== |
|||
The argument below is adapted from an argument of [[Paul Halmos]].<ref>In his autobiography, Halmos criticized the form in which the birthday paradox is often presented, in terms of numerical computation. He believed that it should be used as an example in the use of more abstract mathematical concepts. He wrote: |
|||
<blockquote>The reasoning is based on important tools that all students of mathematics should have ready access to. The birthday problem used to be a splendid illustration of the advantages of pure thought over mechanical manipulation; the inequalities can be obtained in a minute or two, whereas the multiplications would take much longer, and be much more subject to error, whether the instrument is a pencil or an old-fashioned desk computer. What [[calculator]]s do not yield is understanding, or mathematical facility, or a solid basis for more advanced, generalized theories.</blockquote> |
The argument below is adapted from an argument of [[Paul Halmos]].{{refn|group=nb|In his autobiography, Halmos criticized the form in which the birthday paradox is often presented, in terms of numerical computation. He believed that it should be used as an example in the use of more abstract mathematical concepts. He wrote: |
||
<blockquote>The reasoning is based on important tools that all students of mathematics should have ready access to. The birthday problem used to be a splendid illustration of the advantages of pure thought over mechanical manipulation; the inequalities can be obtained in a minute or two, whereas the multiplications would take much longer, and be much more subject to error, whether the instrument is a pencil or an old-fashioned desk computer. What [[calculator]]s do not yield is understanding, or mathematical facility, or a solid basis for more advanced, generalized theories.</blockquote>}} |
|||
As stated above, the probability that no two birthdays coincide is |
As stated above, the probability that no two birthdays coincide is |
||
:<math>1-p(n) = \bar p(n) = \prod_{k=1}^{n-1}\left(1-{k |
:<math>1-p(n) = \bar p(n) = \prod_{k=1}^{n-1}\left(1-\frac{k}{365}\right) .</math> |
||
As in earlier paragraphs, interest lies in the smallest |
As in earlier paragraphs, interest lies in the smallest {{mvar|n}} such that {{math|''p''(''n'') > {{sfrac|1|2}}}}; or equivalently, the smallest {{mvar|n}} such that {{math|''{{overline|p}}''(''n'') < {{sfrac|1|2}}}}. |
||
Using the inequality 1 |
Using the inequality {{math|1 − ''x'' < ''e''<sup>−''x''</sup>}} in the above expression we replace {{math|1 − {{sfrac|''k''|365}}}} with {{math|''e''<sup>{{frac|−''k''|365}}</sup>}}. This yields |
||
:<math>\bar p(n) = \prod_{k=1}^{n-1}\left(1-{k |
:<math>\bar p(n) = \prod_{k=1}^{n-1}\left(1-\frac{k}{365}\right) < \prod_{k=1}^{n-1}\left(e^{-\frac{k}{365}}\right) = e^{-\frac{n(n-1)}{730}} .</math> |
||
Therefore, the expression above is not only an approximation, but also an [[upper bound]] of '' |
Therefore, the expression above is not only an approximation, but also an [[upper bound]] of {{math|''{{overline|p}}''(''n'')}}. The inequality |
||
:<math> e^{- |
:<math> e^{-\frac{n(n-1)}{730}} < \frac{1}{2}</math> |
||
implies '' |
implies {{math|''{{overline|p}}''(''n'') < {{sfrac|1|2}}}}. Solving for {{mvar|n}} gives |
||
:<math>n^2-n > |
:<math>n^2-n > 730 \ln 2 .</math> |
||
Now, 730 |
Now, {{math|730 ln 2}} is approximately 505.997, which is barely below 506, the value of {{math|''n''<sup>2</sup> − ''n''}} attained when {{math|''n'' {{=}} 23}}. Therefore, 23 people suffice. Incidentally, solving {{math|''n''<sup>2</sup> − ''n'' {{=}} 730 ln 2}} for ''n'' gives the approximate formula of Frank H. Mathis cited above. |
||
Solving ''n''<sup>2</sup> − ''n'' = 2 · 365 · ln 2 for ''n'' gives, by the way, the approximate formula of Frank H. Mathis cited above. |
|||
This derivation only shows that ''at most'' 23 people are needed to ensure a birthday match |
This derivation only shows that ''at most'' 23 people are needed to ensure the chances of a birthday match are at least even; it leaves open the possibility that {{mvar|n}} is 22 or less could also work. |
||
== |
==Generalizations== |
||
=== Cast as a collision problem === |
|||
The birthday problem can be generalized as follows: given ''n'' random integers drawn from a [[Uniform distribution (discrete)|discrete uniform distribution]] with range [1,''d''], what is the probability ''p''(''n'';''d'') that at least two numbers are the same? (''d=365'' gives the usual birthday problem.) |
|||
===Arbitrary number of days=== |
|||
The generic results can be derived using the same arguments given above. |
|||
Given a year with {{mvar|d}} days, the '''generalized birthday problem''' asks for the minimal number {{math|''n''(''d'')}} such that, in a set of {{mvar|n}} randomly chosen people, the probability of a birthday coincidence is at least 50%. In other words, {{math|''n''(''d'')}} is the minimal integer {{mvar|n}} such that |
|||
:<math>p(n;d) = \begin{cases} 1-\prod_{k=1}^{n-1}\left(1-{k \over d}\right) & n\le d \\ 1 & n > d \end{cases}</math> |
|||
:<math>p(n;d) \approx 1 - e^{-n(n-1)/(2d)}</math> |
|||
:<math>p(n;d) \approx 1 - \left( \frac{d-1}{d} \right)^{n(n-1)/2}</math> |
|||
:<math>1-\left(1-\frac{1}{d}\right)\left(1-\frac{2}{d}\right)\cdots\left(1-\frac{n-1}{d}\right)\geq \frac{1}{2}.</math> |
|||
Conversely, if ''n(p;d)'' denotes the number of random integers drawn from [1,''d''] to obtain a probability ''p'' that at least two numbers are the same, then |
|||
:<math>n(p;d)\approx \sqrt{2d \cdot \ln\left({1 \over 1-p}\right)}.</math> |
|||
The classical birthday problem thus corresponds to determining {{math|''n''(365)}}. The first 99 values of {{math|''n''(''d'')}} are given here {{OEIS|id=A033810}}: |
|||
The birthday problem in this more generic sense applies to [[hash function]]s: the expected number of ''N''-[[bit]] hashes that can be generated before getting a collision is not 2<sup>''N''</sup>, but rather only 2<sup>''N''/2</sup>. This is exploited by [[birthday attack]]s on [[cryptographic hash function]]s and is the reason why a small number of collisions in a [[hash table]] are, for all practical purposes, inevitable. |
|||
:{| class="wikitable" style="text-align:center;" |
|||
The theory behind the birthday problem was used by Zoe Schnabel<ref>Z. E. Schnabel (1938) ''The Estimation of the Total Fish Population of a Lake'', [[American Mathematical Monthly]] '''45''', 348–352.</ref> under the name of [[mark and recapture|capture-recapture]] statistics to estimate the size of fish population in lakes. |
|||
|- |
|||
! scope="row" | {{mvar|d}} |
|||
| 1–2 || 3–5 || 6–9 || 10–16 || 17–23 || 24–32 || 33–42 || 43–54 || 55–68 || 69–82 || 83–99 |
|||
|- |
|||
! scope="row" | {{math|''n''(''d'')}} |
|||
| 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11 || 12 |
|||
|} |
|||
A similar calculation shows that {{math|''n''(''d'')}} = 23 when {{mvar|d}} is in the range 341–372. |
|||
==== Generalization to multiple types ==== |
|||
A number of bounds and formulas for {{math|''n''(''d'')}} have been published.<ref>{{wikicite|ref={{Harvid|Brink|2012}}|reference=D. Brink, A (probably) exact solution to the Birthday Problem, Ramanujan Journal, 2012, [https://link.springer.com/article/10.1007/s11139-011-9343-9].}}</ref> |
|||
The basic problem considers all trials to be of one "type". The birthday problem has been generalized to consider an arbitrary number of types.<ref>M. C. Wendl (2003) ''[http://dx.doi.org/10.1016/S0167-7152(03)00168-8 Collision Probability Between Sets of Random Variables]'', Statistics and Probability Letters '''64'''(3), 249–254.</ref> In the simplest extension there are just two types, say ''m'' "men" and ''n'' "women", and the problem becomes characterizing the probability of a shared birthday between at least one man and one woman. (Shared birthdays between, say, two women do not count.) The probability of ''no'' (i.e. zero) shared birthdays here is |
|||
For any {{math|''d'' ≥ 1}}, the number {{math|''n''(''d'')}} satisfies<ref>{{Harvard citations|author=Brink|year=2012|nb=yes|loc=Theorem 2}}</ref> |
|||
:<math> |
:<math>\frac{3-2\ln2}{6}<n(d)-\sqrt{2d\ln2}\leq 9-\sqrt{86\ln2}.</math> |
||
These bounds are optimal in the sense that the sequence {{math|''n''(''d'') − {{sqrt|2''d'' ln 2}}}} |
|||
where ''d'' = 365 and ''S''<sub>2</sub> are [[Stirling number|Stirling numbers of the second kind]]. Consequently, the desired probability is 1 − ''p''<sub>0</sub>. |
|||
gets arbitrarily close to |
|||
:<math>\frac{3-2\ln2}{6} \approx 0.27,</math> |
|||
while it has |
|||
:<math>9-\sqrt{86\ln2}\approx 1.28</math> |
|||
as its maximum, taken for {{math|''d'' {{=}} 43}}. |
|||
The bounds are sufficiently tight to give the exact value of {{math|''n''(''d'')}} in most of the cases. For example, for {{math|''d'' {{=}}}} 365 these bounds imply that {{math|22.7633 < ''n''(365) < 23.7736}} and 23 is the only integer in that range. In general, it follows from these bounds that {{math|''n''(''d'')}} always equals either |
|||
This variation of the birthday problem is interesting because there is not a unique solution for the total number of people ''m'' + ''n''. For example, the usual 0.5 probability value is realized for both a 32-member group of 16 men and 16 women and a 49-member group of 43 women and 6 men. |
|||
:<math>\left\lceil\sqrt{2d\ln2}\,\right\rceil \quad\text{or}\quad \left\lceil\sqrt{2d\ln2}\,\right\rceil+1</math> |
|||
where {{math|⌈ · ⌉}} denotes the [[Floor and ceiling functions|ceiling function]]. |
|||
The formula |
|||
:<math>n(d) = \left\lceil\sqrt{2d\ln2}\,\right\rceil</math> |
|||
== Other birthday problems == |
|||
=== Reverse problem === |
|||
For a fixed probability ''p'': |
|||
* Find the greatest ''n'' for which the probability ''p''(''n'') is smaller than the given ''p'', or |
|||
* Find the smallest ''n'' for which the probability ''p''(''n'') is greater than the given ''p''. |
|||
holds for 73% of all integers {{mvar|d}}.<ref name=Brink>{{Harvard citations|author=Brink|year=2012|nb=yes|loc=Theorem 3}}</ref> The formula |
|||
Taking the above formula for ''d'' = 365 we have: |
|||
:<math>n( |
:<math>n(d) = \left\lceil\sqrt{2d\ln2}+\frac{3-2\ln2}{6}\right\rceil</math> |
||
holds for [[almost all]] {{mvar|d}}, i.e., for a set of integers {{mvar|d}} with [[asymptotic density]] 1.<ref name=Brink/> |
|||
==== Sample calculations ==== |
|||
The formula |
|||
{| class="wikitable" |
|||
|----- |
|||
! ''p'' || ''n'' |
|||
! ''n''↓ || ''p''(''n''↓) || ''n''↑ || ''p''(''n''↑) |
|||
|----- |
|||
| <span style="color:magenta">0.01</span> |
|||
| 0.14178√365 = <span style="color:magenta">2.70864</span> |
|||
| align="right" | 2 || 0.00274 || align="right" | 3 |
|||
| <span style="color:magenta">0.00820</span> |
|||
|----- |
|||
| 0.05 || 0.32029√365 = 6.11916 |
|||
| align="right" | 6 || 0.04046 || align="right" | 7 || 0.05624 |
|||
|----- |
|||
| <span style="color:magenta">0.1</span> |
|||
| 0.45904√365 = <span style="color:magenta"> 8.77002</span> |
|||
| align="right" | 8 || 0.07434 || align="right" | 9 |
|||
| <span style="color:magenta">0.09462</span> |
|||
|----- |
|||
| <span style="color:magenta">0.2</span> |
|||
| 0.66805√365 = <span style="color:magenta">12.76302</span> |
|||
| align="right" | 12 || 0.16702 || align="right" | 13 |
|||
| <span style="color:magenta">0.19441</span> |
|||
|----- |
|||
| 0.3 || 0.84460√365 = 16.13607 |
|||
| align="right" | 16 || 0.28360 || align="right" | 17 || 0.31501 |
|||
|----- |
|||
| 0.5 || 1.17741√365 = 22.49439 |
|||
| align="right" | 22 || 0.47570 || align="right" | 23 || 0.50730 |
|||
|----- |
|||
| 0.7 || 1.55176√365 = 29.64625 |
|||
| align="right" | 29 || 0.68097 || align="right" | 30 || 0.70632 |
|||
|----- |
|||
| 0.8 || 1.79412√365 = 34.27666 |
|||
| align="right" | 34 || 0.79532 || align="right" | 35 || 0.81438 |
|||
|----- |
|||
| 0.9 || 2.14597√365 = 40.99862 |
|||
| align="right" | 40 || 0.89123 || align="right" | 41 || 0.90315 |
|||
|----- |
|||
| 0.95 || 2.44775√365 = 46.76414 |
|||
| align="right" | 46 || 0.94825 || align="right" | 47 || 0.95477 |
|||
|----- |
|||
| <span style="color:magenta">0.99</span> |
|||
| 3.03485√365 = <span style="color:magenta">57.98081</span> |
|||
| align="right" | 57 |
|||
| <span style="color:magenta">0.99012</span> |
|||
| align="right" | 58 || 0.99166 |
|||
|} |
|||
Note: some values falling outside the bounds have been <span style="color:magenta">colored</span> to show that the approximation is '''not''' always exact. |
|||
:<math>n(d)=\left\lceil \sqrt{2d\ln2}+\frac{3-2\ln2}{6}+\frac{9-4(\ln2)^2}{72\sqrt{2d\ln2}}\right\rceil</math> |
|||
=== First match === |
|||
A related question is, as people enter a room one at a time, which one is most likely to be the first to have the same birthday as someone already in the room? That is, for what ''n'' is ''p''(''n'') − ''p''(''n'' − 1) maximum? The answer is 20—if there's a prize for first match, the best position in line is 20th. |
|||
holds for all {{math|''d'' ≤ {{val|e=18}}}}, but it is conjectured that there are infinitely many counterexamples to this formula.<ref name="ReferenceA">{{Harvard citations|author=Brink|year=2012|nb=yes|loc=Table 3, Conjecture 1}}</ref> |
|||
=== Same birthday as you === |
|||
[[Image:Birthday paradox.svg|thumb|right|290px|Comparing ''p''(''n'') = probability of a birthday match with ''q''(''n'') = probability of matching ''your'' birthday]] |
|||
The formula |
|||
Note that in the birthday problem, neither of the two people is chosen in advance. By way of contrast, the probability ''q''(''n'') that someone in a room of ''n'' other people has the same birthday as a particular person (for example, you), is given by |
|||
:<math>n(d)=\left\lceil \sqrt{2d\ln2}+\frac{3-2\ln2}{6}+\frac{9-4(\ln2)^2}{72\sqrt{2d\ln2}}-\frac{2(\ln2)^2}{135d}\right\rceil</math> |
|||
holds for all {{math|''d'' ≤ {{val|e=18}}}}, and it is conjectured that this formula holds for all {{mvar|d}}.<ref name="ReferenceA"/> |
|||
===More than two people sharing a birthday=== |
|||
It is possible to extend the problem to ask how many people in a group are necessary for there to be a greater than 50% probability that at least 3, 4, 5, etc. of the group share the same birthday. |
|||
The first few values are as follows: >50% probability of 3 people sharing a birthday - 88 people; >50% probability of 4 people sharing a birthday - 187 people {{OEIS|A014088}}.<ref>{{cite web |title=Minimal number of people to give a 50% probability of having at least n coincident birthdays in one year. |url=https://oeis.org/A014088 |website=The On-line Encyclopedia of Integer Sequences |publisher=OEIS |access-date=17 February 2020}}</ref> |
|||
===Probability of a shared birthday (collision)=== |
|||
The birthday problem can be generalized as follows: |
|||
:Given {{mvar|n}} random integers drawn from a [[Uniform distribution (discrete)|discrete uniform distribution]] with range {{math|[1,''d'']}}, what is the probability {{math|''p''(''n''; ''d'')}} that at least two numbers are the same? ({{math|''d'' {{=}} 365}} gives the usual birthday problem.)<ref>{{cite conference | title = Birthday Paradox for Multi-collisions| last1 = Suzuki| first1 = K. |
|||
| last2 = Tonien| first2 = D.|display-authors=et al| date = 2006| publisher = Springer| book-title = Lecture Notes in Computer Science, vol 4296 | location = Berlin |
|||
| id = Information Security and Cryptology – ICISC 2006| editor = Rhee M.S., Lee B. | doi = 10.1007/11927587_5}}</ref> |
|||
The generic results can be derived using the same arguments given above. |
|||
:<math>\begin{align} |
|||
p(n;d) &= \begin{cases} 1-\displaystyle\prod_{k=1}^{n-1}\left(1-\frac{k}{d}\right) & n\le d \\ 1 & n > d \end{cases} \\[8px] |
|||
& \approx 1 - e^{-\frac{n(n-1)}{2d}} \\ |
|||
& \approx 1 - \left( \frac{d-1}{d} \right)^\frac{n(n-1)}{2} |
|||
\end{align}</math> |
|||
Conversely, if {{math|''n''(''p''; ''d'')}} denotes the number of random integers drawn from {{math|[1,''d'']}} to obtain a probability {{mvar|p}} that at least two numbers are the same, then |
|||
:<math>n(p;d)\approx \sqrt{2d \cdot \ln\left(\frac{1}{1-p}\right)}.</math> |
|||
The birthday problem in this more generic sense applies to [[hash function]]s: the expected number of {{math|''N''}}-[[bit]] hashes that can be generated before getting a collision is not {{math|2<sup>''N''</sup>}}, but rather only {{math|2<sup>{{frac|''N''|2}}</sup>}}. This is exploited by [[birthday attack]]s on [[cryptographic hash function]]s and is the reason why a small number of collisions in a [[hash table]] are, for all practical purposes, inevitable. |
|||
The theory behind the birthday problem was used by Zoe Schnabel<ref>Z. E. Schnabel (1938) ''The Estimation of the Total Fish Population of a Lake'', [[American Mathematical Monthly]] '''45''', 348–352.</ref> under the name of [[mark and recapture|capture-recapture]] statistics to estimate the size of fish population in lakes. |
|||
====Generalization to multiple types of people==== |
|||
[[File:2d birthday.png|thumb|Plot of the probability of at least one shared birthday between at least one man and one woman]] |
|||
The basic problem considers all trials to be of one "type". The birthday problem has been generalized to consider an arbitrary number of types.<ref>[[Michael Christopher Wendl|M. C. Wendl]] (2003) ''[https://dx.doi.org/10.1016/S0167-7152(03)00168-8 Collision Probability Between Sets of Random Variables]'', Statistics and Probability Letters '''64'''(3), 249–254.</ref> In the simplest extension there are two types of people, say {{mvar|m}} men and {{mvar|n}} women, and the problem becomes characterizing the probability of a shared birthday between at least one man and one woman. (Shared birthdays between two men or two women do not count.) The probability of no shared birthdays here is |
|||
:<math>p_0 =\frac{1}{d^{m+n}} \sum_{i=1}^m \sum_{j=1}^n S_2(m,i) S_2(n,j) \prod_{k=0}^{i+j-1} d - k</math> |
|||
where {{math|''d'' {{=}} 365}} and {{math|''S''<sub>2</sub>}} are [[Stirling numbers of the second kind]]. Consequently, the desired probability is {{math|1 − ''p''<sub>0</sub>}}. |
|||
This variation of the birthday problem is interesting because there is not a unique solution for the total number of people {{math|''m'' + ''n''}}. For example, the usual 50% probability value is realized for both a 32-member group of 16 men and 16 women and a 49-member group of 43 women and 6 men. |
|||
==Other birthday problems== |
|||
===First match=== |
|||
A related question is, as people enter a room one at a time, which one is most likely to be the first to have the same birthday as someone already in the room? That is, for what {{mvar|n}} is {{math|''p''(''n'') − ''p''(''n'' − 1)}} maximum? The answer is 20—if there is a prize for first match, the best position in line is 20th.{{citation needed|date=September 2019}} |
|||
===Same birthday as you=== |
|||
[[Image:Birthday paradox.svg|thumb|right|upright=1.4|Comparing {{math|''p''(''n'')}} = probability of a birthday match with {{math|''q''(''n'')}} = probability of matching ''your'' birthday]] |
|||
In the birthday problem, neither of the two people is chosen in advance. By contrast, the probability {{math|''q''(''n'')}} that ''at least one other person'' in a room of {{mvar|n}} other people has the same birthday as a ''particular'' person (for example, you) is given by |
|||
: <math> q(n) = 1 - \left( \frac{365-1}{365} \right)^n </math> |
: <math> q(n) = 1 - \left( \frac{365-1}{365} \right)^n </math> |
||
and for general |
and for general {{mvar|d}} by |
||
: <math> q(n;d) = 1 - \left( \frac{d-1}{d} \right)^n. </math> |
: <math> q(n;d) = 1 - \left( \frac{d-1}{d} \right)^n. </math> |
||
In the standard case of ''d'' = 365 substituting ''n'' = 23 gives about 6.1%, which is less than 1 chance in 16. For a greater than 50% chance that one person in a roomful of |
In the standard case of {{math|''d'' {{=}} 365}}, substituting {{math|''n'' {{=}} 23}} gives about 6.1%, which is less than 1 chance in 16. For a greater than 50% chance that ''at least'' one other person in a roomful of {{mvar|n}} people has the same birthday as ''you'', {{mvar|n}} would need to be at least 253. This number is significantly higher than {{math|{{sfrac|365|2}} {{=}} 182.5}}: the reason is that it is likely that there are some birthday matches among the other people in the room. |
||
=== Number of people with a shared birthday === |
|||
For any one person in a group of ''n'' people the probability that he or she shares his birthday with someone else is <math> q(n-1;d) </math>, as explained above. The expected number of people with a shared (non-unique) birthday can now be calculated easily by multiplying that probability by the number of people (''n''), so it is: |
|||
: <math> n\left(1 - \left( \frac{d-1}{d} \right)^{n-1}\right) </math> |
|||
(This multiplication can be done this way because of the linearity of the [[expected value]] of indicator variables). This implies that the expected number of people with a non-shared (unique) birthday is: |
|||
: <math> n \left( \frac{d-1}{d} \right)^{n-1} </math> |
|||
Similar formulas can be derived for the expected number of people who share with three, four, etc. other people. |
|||
=== Number of people until every birthday is achieved === |
|||
The expected number of people needed until every birthday is achieved is called the [[Coupon collector's problem]]. It can be calculated by {{math|''nH<sub>n</sub>''}}, where {{math|''H<sub>n</sub>''}} is the {{mvar|n}}th [[harmonic number]]. For 365 possible dates (the birthday problem), the answer is 2365. |
|||
===Near matches=== |
|||
Another generalization is to ask for the probability of finding at least one pair in a group of {{mvar|n}} people with birthdays within {{mvar|k}} calendar days of each other, if there are {{mvar|d}} equally likely birthdays.<ref name="abramson">M. Abramson and W. O. J. Moser (1970) ''More Birthday Surprises'', [[American Mathematical Monthly]] '''77''', 856–858</ref> |
|||
:<math> \begin{align} p(n,k,d) &= 1 - \frac{ (d - nk -1)! }{ d^{n-1} \bigl(d - n(k+1)\bigr)!}\end{align} </math> |
|||
It is not a coincidence that <math>253=\frac{23\times(23-1)}{2}</math>; a similar approximate pattern can be found using a number of possibilities different from 365, or a target probability different from 50%. |
|||
The number of people required so that the probability that some pair will have a birthday separated by {{mvar|k}} days or fewer will be higher than 50% is given in the following table: |
|||
=== Near matches === |
|||
Another generalization is to ask how many people are needed in order to have a better than 50% chance that two people have a birthday within one day of each other, or within two, three, etc., days of each other. This is a more difficult problem and requires use of the [[inclusion-exclusion principle]]. The number of people required so that the probability that some pair will have a birthday separated by fewer than ''k'' days will be higher than 50% is: |
|||
{| class="wikitable" style="text-align: center" |
:{| class="wikitable" style="text-align: center" |
||
! {{mvar|''k''}} !! {{mvar|n}}<br />for {{math|''d'' {{=}} 365}} |
|||
! ''k'' !! # people required |
|||
|- |
|- |
||
| |
|0 || 23 |
||
|- |
|- |
||
| |
|1 || 14 |
||
|- |
|- |
||
| |
|2 || 11 |
||
|- |
|- |
||
| |
|3 || 9 |
||
|- |
|||
|4 || 8 |
|||
|- |
|- |
||
|5 || 8 |
|5 || 8 |
||
|- |
|- |
||
|6 || |
|6 || 7 |
||
|- |
|- |
||
|7 || 7 |
|7 || 7 |
||
|- |
|||
|8 || 7 |
|||
|} |
|} |
||
Thus in a group of just seven random people, it is more likely than not that two of them will have a birthday within a week of each other.<ref name="abramson" |
Thus in a group of just seven random people, it is more likely than not that two of them will have a birthday within a week of each other.<ref name="abramson"/> |
||
=== Number of days with a certain number of birthdays === |
|||
=== Collision counting === |
|||
==== Number of days with at least one birthday ==== |
|||
The probability that the ''k''th integer randomly chosen from [1, ''d''] will repeat at least one previous choice equals ''q''(''k'' − 1; ''d'') above. The expected total number of times a selection will repeat a previous selection as ''n'' such integers are chosen equals |
|||
The expected number of different birthdays, i.e. the number of days that are at least one person's birthday, is: |
|||
:<math>d - d \left (\frac {d-1} {d} \right )^n </math> |
|||
This follows from the expected number of days that are no one's birthday: |
|||
:<math>d \left (\frac {d-1} {d} \right )^n </math> |
|||
which follows from the probability that a particular day is no one's birthday, {{math|{{pars|s=150%|{{sfrac|''d'' − 1|''d''}}}}{{su|p=''n''|b= }}}}, easily summed because of the linearity of the expected value. |
|||
For instance, with {{math|1={{var|d}} = 365}}, you should expect about 21 different birthdays when there are 22 people, or 46 different birthdays when there are 50 people. When there are 1000 people, there will be around 341 different birthdays (24 unclaimed birthdays). |
|||
:<math>\sum_{k=1}^n q(k-1;d) = n - d + d \left (\frac {d-1} {d} \right )^n.</math> |
|||
==== Number of days with at least two birthdays ==== |
|||
=== Average number of people === |
|||
The above can be generalized from the distribution of the number of people with their birthday on any particular day, which is a [[Binomial distribution]] with probability {{math|{{sfrac|1|''d''}}}}. Multiplying the relevant probability by {{mvar|d}} will then give the expected number of days. For example, the expected number of days which are shared; i.e. which are at least two (i.e. not zero and not one) people's birthday is: |
|||
In an alternative formulation of the birthday problem, one asks the ''average'' number of people required to find a pair with the same birthday. The problem is relevant to several [[hash function|hashing algorithms]] analyzed by [[Donald Knuth]] in his book ''[[The Art of Computer Programming]]''. It may be shown<ref name="knuth73">D. E. Knuth; ''[[The Art of Computer Programming]]. Vol. 3, Sorting and Searching'' (Addison-Wesley, Reading, Massachusetts, 1973)</ref><ref name="flajolet95">P. Flajolet, P. J. Grabner, P. Kirschenhofer, H. Prodinger (1995), ''On Ramanujan's Q-Function'', Journal of Computational and Applied Mathematics '''58''', 103–116</ref> that if one samples uniformly, with replacement, from a population of size ''M'', the number of trials required for the first repeated sampling of ''some'' individual has [[expected value]] <math>\scriptstyle\overline{n}\,=\,1+Q(M)</math>, where |
|||
<math display="block">d - d \left (\frac {d-1} {d} \right )^n - d \cdot \binom{n}{1} \left (\frac {1} {d} \right )^1\left (\frac {d-1} {d} \right )^{n-1} = d - d \left (\frac {d-1} {d} \right )^n - n \left (\frac {d-1} {d} \right )^{n-1} </math> |
|||
: <math>Q(M)=\sum_{k=1}^{M} \frac{M!}{(M-k)! M^k}.</math> |
|||
===Number of people who repeat a birthday=== |
|||
The probability that the {{mvar|k}}th integer randomly chosen from {{math|[1,''d'']}} will repeat at least one previous choice equals {{math|''q''(''k'' − 1; ''d'')}} above. The expected total number of times a selection will repeat a previous selection as {{mvar|n}} such integers are chosen equals<ref>{{cite web|last1=Might|first1=Matt|title=Collision hash collisions with the birthday paradox|url=http://matt.might.net/articles/counting-hash-collisions/|website=Matt Might's blog|access-date=17 July 2015}}</ref> |
|||
:<math>\sum_{k=1}^n q(k-1;d) = n - d + d \left (\frac {d-1} {d} \right )^n</math> |
|||
This can be seen to equal the number of people minus the expected number of different birthdays. |
|||
===Average number of people to get at least one shared birthday=== |
|||
In an alternative formulation of the birthday problem, one asks the ''average'' number of people required to find a pair with the same birthday. If we consider the probability function Pr[{{mvar|n}} people have at least one shared birthday], this ''average'' is determining the [[mean]] of the distribution, as opposed to the customary formulation, which asks for the [[median]]. The problem is relevant to several [[hash function|hashing algorithms]] analyzed by [[Donald Knuth]] in his book ''[[The Art of Computer Programming]]''. It may be shown<ref name="knuth73">{{cite book |first=D. E. |last=Knuth |title=The Art of Computer Programming |volume=3, Sorting and Searching |publisher=Addison-Wesley |location=Reading, Massachusetts |year=1973 |isbn=978-0-201-03803-3 }}</ref><ref name="flajolet95">{{cite journal |first1=P. |last1=Flajolet |first2=P. J. |last2=Grabner |first3=P. |last3=Kirschenhofer |first4=H. |last4=Prodinger |year=1995 |title=On Ramanujan's Q-Function |journal=Journal of Computational and Applied Mathematics |volume=58 |pages=103–116 |doi=10.1016/0377-0427(93)E0258-N |doi-access=free }}</ref> that if one samples uniformly, with replacement, from a population of size {{math|''M''}}, the number of trials required for the first repeated sampling of ''some'' individual has [[expected value]] {{math|{{overline|''n''}} {{=}} 1 + ''Q''(''M'')}}, where |
|||
: <math>Q(M)=\sum_{k=1}^M \frac{M!}{(M-k)! M^k}.</math> |
|||
The function |
The function |
||
: <math>Q(M)= 1 + \frac{M-1}{M} + \frac{(M-1)(M-2)}{M^2} + \cdots + \frac{(M-1)(M-2) \cdots 1}{M^{M-1}}</math> |
: <math>Q(M)= 1 + \frac{M-1}{M} + \frac{(M-1)(M-2)}{M^2} + \cdots + \frac{(M-1)(M-2) \cdots 1}{M^{M-1}}</math> |
||
<!-- Shouldn't this sum be up to M ? Without it the term M!/M^M is missing--> |
|||
has been studied by [[Srinivasa Ramanujan]] and has [[asymptotic expansion]]: |
has been studied by [[Srinivasa Ramanujan]] and has [[asymptotic expansion]]: |
||
: <math>Q(M)\sim\sqrt{\frac{\pi M}{2}}-\frac{1}{3}+\frac{1}{12}\sqrt{\frac{\pi}{2M}}-\frac{4}{135M}+\cdots.</math> |
: <math>Q(M)\sim\sqrt{\frac{\pi M}{2}}-\frac{1}{3}+\frac{1}{12}\sqrt{\frac{\pi}{2M}}-\frac{4}{135M}+\cdots.</math> |
||
With ''M'' |
With {{math|1=''M'' = 365}} days in a year, the average number of people required to find a pair with the same birthday is {{math|1={{overline|''n''}} = 1 + ''Q''(''M'') ≈ 24.61659}}, somewhat more than 23, the number required for a 50% chance. In the best case, two people will suffice; at worst, the maximum possible number of {{math|1=''M'' + 1 = 366}} people is needed; but on average, only 25 people are required |
||
An analysis using indicator random variables can provide a simpler but approximate analysis of this problem.<ref>{{Cite book|title=Introduction to Algorithms|last=Cormen |display-authors=etal }}</ref> For each pair (''i'', ''j'') for k people in a room, we define the indicator random variable ''X<sub>ij</sub>'', for <math>1\leq i \leq j\leq k</math>, by |
|||
An ''informal'' demonstration of the problem can be made from the [[list of Prime Ministers of Australia]], in which [[Paul Keating]], the 24th Prime Minister, and [[Edmund Barton]], the first Prime Minister, share same birthday i.e. 18 January. |
|||
<math display="block">\begin{alignat}{2} |
|||
[[James K. Polk]] and [[Warren G. Harding]], the 11th and 29th Presidents of the United States, were both born on November 2. |
|||
X_{ij} & |
|||
= I \{ \text{person }i\text{ and person }j\text{ have the same birthday} \} \\[10pt] & |
|||
= \begin{cases} |
|||
1, & \text{if person }i\text{ and person }j\text{ have the same birthday;} \\ |
|||
0, & \text{otherwise.} |
|||
\end{cases} |
|||
\end{alignat}</math> |
|||
<math display="block">\begin{alignat}{2} |
|||
[[John A. Macdonald|Sir John A. Macdonald]] and [[Jean Chrétien]], the 1st and 20th Prime Ministers of Canada, were both born on January 11. |
|||
E[X_{ij}] & |
|||
= \Pr \{ \text{person }i\text{ and person }j\text{ have the same birthday} \} = \frac{1}{n}. |
|||
\end{alignat}</math> |
|||
Let ''X'' be a random variable counting the pairs of individuals with the same birthday. |
|||
Of the 73 male actors to win the [[Academy Award for Best Actor]], there are six pairs of actors who share the same birthday.<ref>They are Spencer Tracy and Gregory Peck (April 5), Rod Steiger and Adrien Brody (April 14), Paul Lukas and John Wayne (May 26), Emil Jannings and Philip Seymour Hoffman (July 23), Robert De Niro and Sean Penn (August 17) and Ben Kingsley and Anthony Hopkins (December 31).</ref> |
|||
<math display="block">X =\sum_{i=1}^k \sum_{j=i+1}^k X_{ij}</math> |
|||
Of the 67 actresses to win the [[Academy Award for Best Actress]], there are three pairs of actresses who share the same birthday.<ref>They are Jane Wyman and Diane Keaton (January 5), Joanne Woodward and Elizabeth Taylor (February 27) and Barbra Streisand and Shirley MacLaine (April 24).</ref> |
|||
<math display="block">\begin{alignat}{3} |
|||
Of the 61 directors to win the [[Academy Award for Best Director]], there are five pairs of directors who share the same birthday.<ref>They are Norman Taurog and Victor Fleming (February 23), William Wyler and Sydney Pollack (July 1), Robert Redford and Roman Polanski (August 18), William Friedkin and Richard Attenborough (August 29) and George Stevens and Steven Spielberg (December 18).</ref> |
|||
E[X] |
|||
& = \sum_{i=1}^k \sum_{j=i+1}^k E[X_{ij}]\\[8pt] |
|||
& = \binom{k}{2} \frac{1}{n}\\[8pt] |
|||
& = \frac{k(k-1)}{2n} |
|||
\end{alignat}</math> |
|||
For {{math|1=''n'' = 365}}, if {{math|1=''k'' = 28}}, the expected number of pairs of individuals with the same birthday is {{sfrac|28 × 27|2 × 365}} ≈ 1.0356. Therefore, we can expect at least one matching pair with at least 28 people. |
|||
Of the 52 people to serve as [[Prime Minister of the United Kingdom]], there are two pairs of men who share the same birthday.<ref>They are John Major and The Earl of Derby (March 29) and Spencer Perceval and The Viscount Goderich (November 1).</ref> |
|||
In the [[2014 FIFA World Cup]], each of the 32 squads had 23 players. An analysis of the official squad lists suggested that 16 squads had pairs of players sharing birthdays, and of these 5 squads had two pairs: Argentina, France, Iran, South Korea and Switzerland each had two pairs, and Australia, Bosnia and Herzegovina, Brazil, Cameroon, Colombia, Honduras, Netherlands, Nigeria, Russia, Spain and USA each with one pair.<ref>{{cite web |url=https://www.bbc.co.uk/news/magazine-27835311 |title=The birthday paradox at the World Cup |last1=Fletcher |first1=James |date=16 June 2014 |website=bbc.com |publisher=BBC |access-date=27 August 2015 }}</ref> |
|||
== Partition problem == |
|||
Voracek, Tran and [[Anton Formann|Formann]] showed that the majority of people markedly overestimate the number of people that is necessary to achieve a given probability of people having the same birthday, and markedly underestimate the probability of people having the same birthday when a specific sample size is given.<ref>{{cite journal |last1=Voracek |first1=M. |last2=Tran |first2=U. S. |last3=Formann |first3=A. K. |year=2008 |title=Birthday and birthmate problems: Misconceptions of probability among psychology undergraduates and casino visitors and personnel |journal=Perceptual and Motor Skills |volume=106 |issue=1 |pages=91–103 |doi=10.2466/pms.106.1.91-103 |pmid=18459359 |s2cid=22046399 }}</ref> Further results showed that psychology students and women did better on the task than casino visitors/personnel or men, but were less confident about their estimates. |
|||
A related problem is the [[partition problem]], a variant of the [[knapsack problem]] from operations research. Some weights are put on a [[Weighing scale|balance scale]]; each weight is an integer number of grams randomly chosen between one gram and one million grams (one metric ton). The question is whether one can usually (that is, with probability close to 1) transfer the weights between the left and right arms to balance the scale. (In case the sum of all the weights is an odd number of grams, a discrepancy of one gram is allowed.) If there are only two or three weights, the answer is very clearly no; although there are some combinations which work, the majority of randomly selected combinations of three weights do not. If there are very many weights, the answer is clearly yes. The question is, how many are just sufficient? That is, what is the number of weights such that it is equally likely for it to be possible to balance them as it is to be impossible? |
|||
===Reverse problem=== |
|||
Some people's intuition is that the answer is above 100,000. Most people's intuition is that it is in the thousands or tens of thousands, while others feel it should at least be in the hundreds. The correct answer is approximately 23. |
|||
The reverse problem is to find, for a fixed probability {{mvar|p}}, |
|||
the greatest {{mvar|n}} for which the probability {{math|''p''(''n'')}} is smaller than the given {{mvar|p}}, or the smallest {{mvar|n}} for which the probability {{math|''p''(''n'')}} is greater than the given {{mvar|p}}.{{citation needed|date=September 2019}} |
|||
Taking the above formula for {{math|''d'' {{=}} 365}}, one has |
|||
The reason is that the correct comparison is to the number of partitions of the weights into left and right. There are 2<sup>''N''−1</sup> different partitions for ''N'' weights, and the left sum minus the right sum can be thought of as a new random quantity for each partition. The distribution of the sum of weights is approximately [[normal distribution|Gaussian]], with a peak at 1,000,000 ''N'' and width <math>\scriptstyle 1,000,000\sqrt{N}</math>, so that when 2<sup>''N''−1</sup> is approximately equal to <math>\scriptstyle 1,000,000\sqrt{N}</math> the transition occurs. 2<sup>23−1</sup> is about 4 million, while the width of the distribution is only 5 million.<ref>C. Borgs, J. Chayes, and B. Pittel (2001) ''Phase Transition and Finite Size Scaling in the Integer Partition Problem'', Random Structures and Algorithms '''19'''(3–4), 247–288.</ref> |
|||
:<math>n(p;365)\approx \sqrt{730\ln\left(\frac{1}{1-p}\right)}.</math> |
|||
== Notes == |
|||
{{reflist}} |
|||
The following table gives some sample calculations. |
|||
:{| class="wikitable" |
|||
|----- |
|||
! {{mvar|p}} || {{mvar|n}} |
|||
! {{math|''n''↓}} || {{math|''p''(''n''↓)}} || {{math|''n''↑}} || {{math|''p''(''n''↑)}} |
|||
|----- |
|||
| <span style="color:maroon">0.01</span> |
|||
| 0.14178{{sqrt|365}} = <span style="color:maroon">2.70864</span> |
|||
| align="right" | 2 || 0.00274 || align="right" | 3 |
|||
| <span style="color:maroon">0.00820</span> |
|||
|----- |
|||
| 0.05 || 0.32029{{sqrt|365}} = 6.11916 |
|||
| align="right" | 6 || 0.04046 || align="right" | 7 || 0.05624 |
|||
|----- |
|||
| <span style="color:maroon">0.1</span> |
|||
| 0.45904{{sqrt|365}} = <span style="color:maroon"> 8.77002</span> |
|||
| align="right" | 8 || 0.07434 || align="right" | 9 |
|||
| <span style="color:maroon">0.09462</span> |
|||
|----- |
|||
| <span style="color:maroon">0.2</span> |
|||
| 0.66805{{sqrt|365}} = <span style="color:maroon">12.76302</span> |
|||
| align="right" | 12 || 0.16702 || align="right" | 13 |
|||
| <span style="color:maroon">0.19441</span> |
|||
|----- |
|||
| 0.3 || 0.84460{{sqrt|365}} = 16.13607 |
|||
| align="right" | 16 || 0.28360 || align="right" | 17 || 0.31501 |
|||
|----- |
|||
| 0.5 || 1.17741{{sqrt|365}} = 22.49439 |
|||
| align="right" | 22 || 0.47570 || align="right" | 23 || 0.50730 |
|||
|----- |
|||
| 0.7 || 1.55176{{sqrt|365}} = 29.64625 |
|||
| align="right" | 29 || 0.68097 || align="right" | 30 || 0.70632 |
|||
|----- |
|||
| 0.8 || 1.79412{{sqrt|365}} = 34.27666 |
|||
| align="right" | 34 || 0.79532 || align="right" | 35 || 0.81438 |
|||
|----- |
|||
| 0.9 || 2.14597{{sqrt|365}} = 40.99862 |
|||
| align="right" | 40 || 0.89123 || align="right" | 41 || 0.90315 |
|||
|----- |
|||
| 0.95 || 2.44775{{sqrt|365}} = 46.76414 |
|||
| align="right" | 46 || 0.94825 || align="right" | 47 || 0.95477 |
|||
|----- |
|||
| <span style="color:maroon">0.99</span> |
|||
| 3.03485{{sqrt|365}} = <span style="color:maroon">57.98081</span> |
|||
| align="right" | 57 |
|||
| <span style="color:maroon">0.99012</span> |
|||
| align="right" | 58 || 0.99166 |
|||
|} |
|||
Some values falling outside the bounds have been <span style="color:maroon">colored</span> to show that the approximation is not always exact. |
|||
==Partition problem== |
|||
A related problem is the [[partition problem]], a variant of the [[knapsack problem]] from [[operations research]]. Some weights are put on a [[Weighing scale|balance scale]]; each weight is an integer number of grams randomly chosen between one gram and one million grams (one [[tonne]]). The question is whether one can usually (that is, with probability close to 1) transfer the weights between the left and right arms to balance the scale. (In case the sum of all the weights is an odd number of grams, a discrepancy of one gram is allowed.) If there are only two or three weights, the answer is very clearly no; although there are some combinations which work, the majority of randomly selected combinations of three weights do not. If there are very many weights, the answer is clearly yes. The question is, how many are just sufficient? That is, what is the number of weights such that it is equally likely for it to be possible to balance them as it is to be impossible? |
|||
Often, people's intuition is that the answer is above {{val|100000}}. Most people's intuition is that it is in the thousands or tens of thousands, while others feel it should at least be in the hundreds. The correct answer is 23.{{Citation needed|date=December 2016}} |
|||
The reason is that the correct comparison is to the number of partitions of the weights into left and right. There are {{math|2<sup>''N'' − 1</sup>}} different partitions for {{math|''N''}} weights, and the left sum minus the right sum can be thought of as a new random quantity for each partition. The distribution of the sum of weights is approximately [[normal distribution|Gaussian]], with a peak at {{math|{{val|500000}}''N''}} and width {{math|{{val|1000000}}{{sqrt|''N''}}}}, so that when {{math|2<sup>''N'' − 1</sup>}} is approximately equal to {{math|{{val|1000000}}{{sqrt|''N''}}}} the transition occurs. 2<sup>23 − 1</sup> is about 4 million, while the width of the distribution is only 5 million.<ref>{{cite journal |first1=C. |last1=Borgs |first2=J. |last2=Chayes |first3=B. |last3=Pittel |s2cid=6819493 |year=2001 |title=Phase Transition and Finite Size Scaling in the Integer Partition Problem |journal=Random Structures and Algorithms |volume=19 |issue=3–4 |pages=247–288 |doi=10.1002/rsa.10004 }}</ref> |
|||
==In fiction== |
|||
[[Arthur C. Clarke]]'s 1961 novel ''[[A Fall of Moondust]]'' contains a section where the main characters, trapped underground for an indefinite amount of time, are celebrating a birthday and find themselves discussing the validity of the birthday problem. As stated by a physicist passenger: "If you have a group of more than twenty-four people, the odds are better than even that two of them have the same birthday." Eventually, out of 22 present, it is revealed that two characters share the same birthday, May 23. |
|||
==Notes== |
|||
{{reflist|group=nb}} |
|||
==References== |
==References== |
||
{{ |
{{Reflist|45em}} |
||
* John G. Kemeny, J. Laurie Snell, and Gerald Thompson Introduction to Finite Mathematics . The first edition, 1957 |
|||
==Bibliography== |
|||
* E. H. McKinney (1966) ''Generalized Birthday Problem'', [[American Mathematical Monthly]] '''73''', 385–387. |
|||
* M. Klamkin and D. Newman (1967) ''Extensions of the Birthday Surprise'', Journal of Combinatorial Theory '''3''', 279–282. |
|||
* M. Abramson |
* {{cite journal |first1=M. |last1=Abramson |first2=W. O. J. |last2=Moser |year=1970 |title=More Birthday Surprises |journal=[[American Mathematical Monthly]] |volume=77 |issue= 8|pages=856–858 |doi= 10.2307/2317022|jstor=2317022 |ref=none}} |
||
* D. Bloom |
* {{cite journal |first=D. |last=Bloom |year=1973 |title=A Birthday Problem |journal=[[American Mathematical Monthly]] |volume=80 |issue= 10|pages=1141–1142 |doi=10.2307/2318556 | jstor=2318556 }} |
||
* {{cite book |first1=John G. |last1=Kemeny |first2=J. Laurie |last2=Snell |first3=Gerald |last3=Thompson |title=Introduction to Finite Mathematics |edition=First |year=1957 |ref=none}} |
|||
* [[Clay Shirky|Shirky, Clay]] ''Here Comes Everybody: The Power of Organizing Without Organizations'', (2008.) New York. 25–27. |
|||
* {{cite journal |first=E. H. |last=McKinney |year=1966 |title=Generalized Birthday Problem |journal=[[American Mathematical Monthly]] |volume=73 |issue= 5|pages=385–387 |doi= 10.2307/2315408|jstor=2315408|ref=none }} |
|||
{{refend}} |
|||
* {{cite journal |first=F.|last=Mosteller |title=Understanding the Birthday Problem |year=1962 |journal=[[The Mathematics Teacher]] |volume=55 |issue= 5|pages=322–325|doi=10.5951/MT.55.5.0322 |jstor=27956609}} Reprinted in {{cite book|title=Selected Papers of Frederick Mosteller|chapter=Understanding the Birthday Problem |series=Springer Series in Statistics|isbn=978-0-387-20271-6 |doi= 10.1007/978-0-387-44956-2_21|pages=349–353|year=2006 |last1=Mosteller |first1=Frederick }} |
|||
* {{cite book |author1-link=Leila Schneps |first1=Leila |last1=Schneps |author2-link=Coralie Colmez |first2=Coralie |last2=Colmez |title=Math on Trial. How Numbers Get Used and Abused in the Courtroom |publisher=Basic Books |year=2013 |isbn=978-0-465-03292-1 |chapter=Math error number 5. The case of Diana Sylvester: cold hit analysis |ref=none|title-link=Math on Trial}} |
|||
* {{cite book|author=Sy M. Blinder|title=Guide to Essential Math: A Review for Physics, Chemistry and Engineering Students|url=https://books.google.com/books?id=M7TCNAEACAAJ|year=2013|publisher=Elsevier|isbn=978-0-12-407163-6|pages=5–6|ref=none}} |
|||
==External links== |
==External links== |
||
* [http://www.efgh.com/math/birthday.htm The Birthday Paradox accounting for leap year birthdays] |
|||
* [http://www.rsscse-edu.org.uk/tsj/wp-content/uploads/2011/03/matthews.pdf Coincidences: the truth is out there] Experimental test of the Birthday Paradox and other coincidences |
|||
* {{MathWorld | urlname=BirthdayProblem | title=Birthday Problem|ref=none}} |
|||
* http://www.efgh.com/math/birthday.htm |
|||
* http://planetmath.org/encyclopedia/BirthdayProblem.html |
|||
* {{MathWorld | urlname=BirthdayProblem | title=Birthday Problem}} |
|||
* [http://www.nestel.net/maple/bd/bd.html Maple vs. birthday paradox] |
|||
* [http://www.damninteresting.com/?p=402 A humorous article explaining the paradox] |
* [http://www.damninteresting.com/?p=402 A humorous article explaining the paradox] |
||
* [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_BirthdayExperiment SOCR EduMaterials activities birthday experiment] |
* [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_BirthdayExperiment SOCR EduMaterials activities birthday experiment] |
||
* [http://betterexplained.com/articles/understanding-the-birthday-paradox/ Understanding the Birthday Problem (Better Explained)] |
* [http://betterexplained.com/articles/understanding-the-birthday-paradox/ Understanding the Birthday Problem (Better Explained)] |
||
* [http://www.matifutbol.com/en/eurobirthdays.html Eurobirthdays 2012. A birthday problem.] A practical football example of the birthday paradox. |
|||
* {{cite web|last=Grime|first=James|title=23: Birthday Probability|url=http://www.numberphile.com/videos/23birthday.html|work=Numberphile|publisher=[[Brady Haran]]|access-date=2013-04-02|archive-url=https://web.archive.org/web/20170225140726/http://www.numberphile.com/videos/23birthday.html|archive-date=2017-02-25|url-status=dead|ref=none}} |
|||
* [https://www.wolframalpha.com/input/?i=birthday+paradox%2C+4+people%2C+100+possible+birthdays Computing the probabilities of the Birthday Problem at WolframAlpha] |
|||
{{Portal bar|Mathematics}} |
|||
{{DEFAULTSORT:Birthday Problem}} |
{{DEFAULTSORT:Birthday Problem}} |
||
[[Category:Probability theory paradoxes]] |
[[Category:Probability theory paradoxes]] |
||
[[Category: |
[[Category:Probability problems]] |
||
[[Category:Applied probability]] |
[[Category:Applied probability]] |
||
[[Category:Birthdays]] |
[[Category:Birthdays]] |
||
[[Category:Mathematical problems]] |
[[Category:Mathematical problems]] |
||
[[Category:Coincidence]] |
|||
{{Link GA|ru}} |
|||
[[ca:Problema dels aniversaris]] |
|||
[[cs:Narozeninový problém]] |
|||
[[da:Fødselsdagsparadokset]] |
|||
[[de:Geburtstagsparadoxon]] |
|||
[[el:Παράδοξο των γενεθλίων]] |
|||
[[es:Paradoja del cumpleaños]] |
|||
[[eu:Urtebetetzeen ebazkizuna]] |
|||
[[fr:Paradoxe des anniversaires]] |
|||
[[gl:Paradoxo do aniversario]] |
|||
[[ko:생일 문제]] |
|||
[[it:Paradosso del compleanno]] |
|||
[[he:פרדוקס יום ההולדת]] |
|||
[[lt:Gimimo dienų paradoksas]] |
|||
[[hu:Születésnap-paradoxon]] |
|||
[[ml:ജന്മദിനപ്രശ്നം]] |
|||
[[nl:Verjaardagenparadox]] |
|||
[[ja:誕生日のパラドックス]] |
|||
[[pl:Paradoks dnia urodzin]] |
|||
[[pt:Paradoxo do aniversário]] |
|||
[[ru:Парадокс дней рождения]] |
|||
[[fi:Syntymäpäiväongelma]] |
|||
[[sv:Födelsedagsparadoxen]] |
|||
[[uk:Парадокс днів народження]] |
|||
[[zh:生日問題]] |
Latest revision as of 18:20, 24 November 2024
In probability theory, the birthday problem asks for the probability that, in a set of n randomly chosen people, at least two will share the same birthday. The birthday paradox refers to the counterintuitive fact that only 23 people are needed for that probability to exceed 50%.
The birthday paradox is a veridical paradox: it seems wrong at first glance but is, in fact, true. While it may seem surprising that only 23 individuals are required to reach a 50% probability of a shared birthday, this result is made more intuitive by considering that the birthday comparisons will be made between every possible pair of individuals. With 23 individuals, there are 23 × 22/2 = 253 pairs to consider, more than half the 365 / 366 days in a calendar year.
Real-world applications for the birthday problem include a cryptographic attack called the birthday attack, which uses this probabilistic model to reduce the complexity of finding a collision for a hash function, as well as calculating the approximate risk of a hash collision existing within the hashes of a given size of population.
The problem is generally attributed to Harold Davenport in about 1927, though he did not publish it at the time. Davenport did not claim to be its discoverer "because he could not believe that it had not been stated earlier".[1][2] The first publication of a version of the birthday problem was by Richard von Mises in 1939.[3]
Calculating the probability
[edit]From a permutations perspective, let the event A be the probability of finding a group of 23 people without any repeated birthdays. Where the event B is the probability of finding a group of 23 people with at least two people sharing same birthday, P(B) = 1 − P(A). P(A) is the ratio of the total number of birthdays, , without repetitions and order matters (e.g. for a group of 2 people, mm/dd birthday format, one possible outcome is ) divided by the total number of birthdays with repetition and order matters, , as it is the total space of outcomes from the experiment (e.g. 2 people, one possible outcome is ). Therefore and are permutations.
Another way the birthday problem can be solved is by asking for an approximate probability that in a group of n people at least two have the same birthday. For simplicity, leap years, twins, selection bias, and seasonal and weekly variations in birth rates[4] are generally disregarded, and instead it is assumed that there are 365 possible birthdays, and that each person's birthday is equally likely to be any of these days, independent of the other people in the group.
For independent birthdays, a uniform distribution of birthdays minimizes the probability of two people in a group having the same birthday. Any unevenness increases the likelihood of two people sharing a birthday.[5][6] However real-world birthdays are not sufficiently uneven to make much change: the real-world group size necessary to have a greater than 50% chance of a shared birthday is 23, as in the theoretical uniform distribution.[7]
The goal is to compute P(B), the probability that at least two people in the room have the same birthday. However, it is simpler to calculate P(A′), the probability that no two people in the room have the same birthday. Then, because B and A′ are the only two possibilities and are also mutually exclusive, P(B) = 1 − P(A′).
Here is the calculation of P(B) for 23 people. Let the 23 people be numbered 1 to 23. The event that all 23 people have different birthdays is the same as the event that person 2 does not have the same birthday as person 1, and that person 3 does not have the same birthday as either person 1 or person 2, and so on, and finally that person 23 does not have the same birthday as any of persons 1 through 22. Let these events be called Event 2, Event 3, and so on. Event 1 is the event of person 1 having a birthday, which occurs with probability 1. This conjunction of events may be computed using conditional probability: the probability of Event 2 is 364/365, as person 2 may have any birthday other than the birthday of person 1. Similarly, the probability of Event 3 given that Event 2 occurred is 363/365, as person 3 may have any of the birthdays not already taken by persons 1 and 2. This continues until finally the probability of Event 23 given that all preceding events occurred is 343/365. Finally, the principle of conditional probability implies that P(A′) is equal to the product of these individual probabilities:
(1) |
The terms of equation (1) can be collected to arrive at:
(2) |
Evaluating equation (2) gives P(A′) ≈ 0.492703
Therefore, P(B) ≈ 1 − 0.492703 = 0.507297 (50.7297%).
This process can be generalized to a group of n people, where p(n) is the probability of at least two of the n people sharing a birthday. It is easier to first calculate the probability p(n) that all n birthdays are different. According to the pigeonhole principle, p(n) is zero when n > 365. When n ≤ 365:
where ! is the factorial operator, (365
n) is the binomial coefficient and kPr denotes permutation.
The equation expresses the fact that the first person has no one to share a birthday, the second person cannot have the same birthday as the first (364/365), the third cannot have the same birthday as either of the first two (363/365), and in general the nth birthday cannot be the same as any of the n − 1 preceding birthdays.
The event of at least two of the n persons having the same birthday is complementary to all n birthdays being different. Therefore, its probability p(n) is
The following table shows the probability for some other values of n (for this table, the existence of leap years is ignored, and each birthday is assumed to be equally likely):
n p(n) 1 0.0% 5 2.7% 10 11.7% 20 41.1% 23 50.7% 30 70.6% 40 89.1% 50 97.0% 60 99.4% 70 99.9% 75 99.97% 100 99.99997% 200 99.9999999999999999999999999998% 300 (100 − 6×10−80)% 350 (100 − 3×10−129)% 365 (100 − 1.45×10−155)% ≥ 366 100%
Approximations
[edit]The Taylor series expansion of the exponential function (the constant e ≈ 2.718281828)
provides a first-order approximation for ex for :
To apply this approximation to the first expression derived for p(n), set x = −a/365. Thus,
Then, replace a with non-negative integers for each term in the formula of p(n) until a = n − 1, for example, when a = 1,
The first expression derived for p(n) can be approximated as
Therefore,
An even coarser approximation is given by
which, as the graph illustrates, is still fairly accurate.
According to the approximation, the same approach can be applied to any number of "people" and "days". If rather than 365 days there are d, if there are n persons, and if n ≪ d, then using the same approach as above we achieve the result that if p(n, d) is the probability that at least two out of n people share the same birthday from a set of d available days, then:
Simple exponentiation
[edit]The probability of any two people not having the same birthday is 364/365. In a room containing n people, there are (n
2) = n(n − 1)/2 pairs of people, i.e. (n
2) events. The probability of no two people sharing the same birthday can be approximated by assuming that these events are independent and hence by multiplying their probability together. Being independent would be equivalent to picking with replacement, any pair of people in the world, not just in a room. In short 364/365 can be multiplied by itself (n
2) times, which gives us
Since this is the probability of no one having the same birthday, then the probability of someone sharing a birthday is
And for the group of 23 people, the probability of sharing is
Poisson approximation
[edit]Applying the Poisson approximation for the binomial on the group of 23 people,
so
The result is over 50% as previous descriptions. This approximation is the same as the one above based on the Taylor expansion that uses ex ≈ 1 + x.
Square approximation
[edit]A good rule of thumb which can be used for mental calculation is the relation
which can also be written as
which works well for probabilities less than or equal to 1/2. In these equations, d is the number of days in a year.
For instance, to estimate the number of people required for a 1/2 chance of a shared birthday, we get
Which is not too far from the correct answer of 23.
Approximation of number of people
[edit]This can also be approximated using the following formula for the number of people necessary to have at least a 1/2 chance of matching:
This is a result of the good approximation that an event with 1/k probability will have a 1/2 chance of occurring at least once if it is repeated k ln 2 times.[8]
Probability table
[edit]length of
hex stringno. of
bits
(b)hash space
size
(2b)Number of hashed elements such that probability of at least one hash collision ≥ p p = 10−18 p = 10−15 p = 10−12 p = 10−9 p = 10−6 p = 0.001 p = 0.01 p = 0.25 p = 0.50 p = 0.75 8 32 4.3×109 2 2 2 2.9 93 2.9×103 9.3×103 5.0×104 7.7×104 1.1×105 (10) (40) (1.1×1012) 2 2 2 47 1.5×103 4.7×104 1.5×105 8.0×105 1.2×106 1.7×106 (12) (48) (2.8×1014) 2 2 24 7.5×102 2.4×104 7.5×105 2.4×106 1.3×107 2.0×107 2.8×107 16 64 1.8×1019 6.1 1.9×102 6.1×103 1.9×105 6.1×106 1.9×108 6.1×108 3.3×109 5.1×109 7.2×109 (24) (96) (7.9×1028) 4.0×105 1.3×107 4.0×108 1.3×1010 4.0×1011 1.3×1013 4.0×1013 2.1×1014 3.3×1014 4.7×1014 32 128 3.4×1038 2.6×1010 8.2×1011 2.6×1013 8.2×1014 2.6×1016 8.3×1017 2.6×1018 1.4×1019 2.2×1019 3.1×1019 (48) (192) (6.3×1057) 1.1×1020 3.5×1021 1.1×1023 3.5×1024 1.1×1026 3.5×1027 1.1×1028 6.0×1028 9.3×1028 1.3×1029 64 256 1.2×1077 4.8×1029 1.5×1031 4.8×1032 1.5×1034 4.8×1035 1.5×1037 4.8×1037 2.6×1038 4.0×1038 5.7×1038 (96) (384) (3.9×10115) 8.9×1048 2.8×1050 8.9×1051 2.8×1053 8.9×1054 2.8×1056 8.9×1056 4.8×1057 7.4×1057 1.0×1058 128 512 1.3×10154 1.6×1068 5.2×1069 1.6×1071 5.2×1072 1.6×1074 5.2×1075 1.6×1076 8.8×1076 1.4×1077 1.9×1077
The lighter fields in this table show the number of hashes needed to achieve the given probability of collision (column) given a hash space of a certain size in bits (row). Using the birthday analogy: the "hash space size" resembles the "available days", the "probability of collision" resembles the "probability of shared birthday", and the "required number of hashed elements" resembles the "required number of people in a group". One could also use this chart to determine the minimum hash size required (given upper bounds on the hashes and probability of error), or the probability of collision (for fixed number of hashes and probability of error).
For comparison, 10−18 to 10−15 is the uncorrectable bit error rate of a typical hard disk.[9] In theory, 128-bit hash functions, such as MD5, should stay within that range until about 8.2×1011 documents, even if its possible outputs are many more.
An upper bound on the probability and a lower bound on the number of people
[edit]The argument below is adapted from an argument of Paul Halmos.[nb 1]
As stated above, the probability that no two birthdays coincide is
As in earlier paragraphs, interest lies in the smallest n such that p(n) > 1/2; or equivalently, the smallest n such that p(n) < 1/2.
Using the inequality 1 − x < e−x in the above expression we replace 1 − k/365 with e−k⁄365. This yields
Therefore, the expression above is not only an approximation, but also an upper bound of p(n). The inequality
implies p(n) < 1/2. Solving for n gives
Now, 730 ln 2 is approximately 505.997, which is barely below 506, the value of n2 − n attained when n = 23. Therefore, 23 people suffice. Incidentally, solving n2 − n = 730 ln 2 for n gives the approximate formula of Frank H. Mathis cited above.
This derivation only shows that at most 23 people are needed to ensure the chances of a birthday match are at least even; it leaves open the possibility that n is 22 or less could also work.
Generalizations
[edit]Arbitrary number of days
[edit]Given a year with d days, the generalized birthday problem asks for the minimal number n(d) such that, in a set of n randomly chosen people, the probability of a birthday coincidence is at least 50%. In other words, n(d) is the minimal integer n such that
The classical birthday problem thus corresponds to determining n(365). The first 99 values of n(d) are given here (sequence A033810 in the OEIS):
d 1–2 3–5 6–9 10–16 17–23 24–32 33–42 43–54 55–68 69–82 83–99 n(d) 2 3 4 5 6 7 8 9 10 11 12
A similar calculation shows that n(d) = 23 when d is in the range 341–372.
A number of bounds and formulas for n(d) have been published.[10] For any d ≥ 1, the number n(d) satisfies[11]
These bounds are optimal in the sense that the sequence n(d) − √2d ln 2 gets arbitrarily close to
while it has
as its maximum, taken for d = 43.
The bounds are sufficiently tight to give the exact value of n(d) in most of the cases. For example, for d = 365 these bounds imply that 22.7633 < n(365) < 23.7736 and 23 is the only integer in that range. In general, it follows from these bounds that n(d) always equals either
where ⌈ · ⌉ denotes the ceiling function. The formula
holds for 73% of all integers d.[12] The formula
holds for almost all d, i.e., for a set of integers d with asymptotic density 1.[12]
The formula
holds for all d ≤ 1018, but it is conjectured that there are infinitely many counterexamples to this formula.[13]
The formula
holds for all d ≤ 1018, and it is conjectured that this formula holds for all d.[13]
More than two people sharing a birthday
[edit]It is possible to extend the problem to ask how many people in a group are necessary for there to be a greater than 50% probability that at least 3, 4, 5, etc. of the group share the same birthday.
The first few values are as follows: >50% probability of 3 people sharing a birthday - 88 people; >50% probability of 4 people sharing a birthday - 187 people (sequence A014088 in the OEIS).[14]
Probability of a shared birthday (collision)
[edit]The birthday problem can be generalized as follows:
- Given n random integers drawn from a discrete uniform distribution with range [1,d], what is the probability p(n; d) that at least two numbers are the same? (d = 365 gives the usual birthday problem.)[15]
The generic results can be derived using the same arguments given above.
Conversely, if n(p; d) denotes the number of random integers drawn from [1,d] to obtain a probability p that at least two numbers are the same, then
The birthday problem in this more generic sense applies to hash functions: the expected number of N-bit hashes that can be generated before getting a collision is not 2N, but rather only 2N⁄2. This is exploited by birthday attacks on cryptographic hash functions and is the reason why a small number of collisions in a hash table are, for all practical purposes, inevitable.
The theory behind the birthday problem was used by Zoe Schnabel[16] under the name of capture-recapture statistics to estimate the size of fish population in lakes.
Generalization to multiple types of people
[edit]The basic problem considers all trials to be of one "type". The birthday problem has been generalized to consider an arbitrary number of types.[17] In the simplest extension there are two types of people, say m men and n women, and the problem becomes characterizing the probability of a shared birthday between at least one man and one woman. (Shared birthdays between two men or two women do not count.) The probability of no shared birthdays here is
where d = 365 and S2 are Stirling numbers of the second kind. Consequently, the desired probability is 1 − p0.
This variation of the birthday problem is interesting because there is not a unique solution for the total number of people m + n. For example, the usual 50% probability value is realized for both a 32-member group of 16 men and 16 women and a 49-member group of 43 women and 6 men.
Other birthday problems
[edit]First match
[edit]A related question is, as people enter a room one at a time, which one is most likely to be the first to have the same birthday as someone already in the room? That is, for what n is p(n) − p(n − 1) maximum? The answer is 20—if there is a prize for first match, the best position in line is 20th.[citation needed]
Same birthday as you
[edit]In the birthday problem, neither of the two people is chosen in advance. By contrast, the probability q(n) that at least one other person in a room of n other people has the same birthday as a particular person (for example, you) is given by
and for general d by
In the standard case of d = 365, substituting n = 23 gives about 6.1%, which is less than 1 chance in 16. For a greater than 50% chance that at least one other person in a roomful of n people has the same birthday as you, n would need to be at least 253. This number is significantly higher than 365/2 = 182.5: the reason is that it is likely that there are some birthday matches among the other people in the room.
Number of people with a shared birthday
[edit]For any one person in a group of n people the probability that he or she shares his birthday with someone else is , as explained above. The expected number of people with a shared (non-unique) birthday can now be calculated easily by multiplying that probability by the number of people (n), so it is:
(This multiplication can be done this way because of the linearity of the expected value of indicator variables). This implies that the expected number of people with a non-shared (unique) birthday is:
Similar formulas can be derived for the expected number of people who share with three, four, etc. other people.
Number of people until every birthday is achieved
[edit]The expected number of people needed until every birthday is achieved is called the Coupon collector's problem. It can be calculated by nHn, where Hn is the nth harmonic number. For 365 possible dates (the birthday problem), the answer is 2365.
Near matches
[edit]Another generalization is to ask for the probability of finding at least one pair in a group of n people with birthdays within k calendar days of each other, if there are d equally likely birthdays.[18]
The number of people required so that the probability that some pair will have a birthday separated by k days or fewer will be higher than 50% is given in the following table:
k n
for d = 3650 23 1 14 2 11 3 9 4 8 5 8 6 7 7 7
Thus in a group of just seven random people, it is more likely than not that two of them will have a birthday within a week of each other.[18]
Number of days with a certain number of birthdays
[edit]Number of days with at least one birthday
[edit]The expected number of different birthdays, i.e. the number of days that are at least one person's birthday, is:
This follows from the expected number of days that are no one's birthday:
which follows from the probability that a particular day is no one's birthday, (d − 1/d)n
, easily summed because of the linearity of the expected value.
For instance, with d = 365, you should expect about 21 different birthdays when there are 22 people, or 46 different birthdays when there are 50 people. When there are 1000 people, there will be around 341 different birthdays (24 unclaimed birthdays).
Number of days with at least two birthdays
[edit]The above can be generalized from the distribution of the number of people with their birthday on any particular day, which is a Binomial distribution with probability 1/d. Multiplying the relevant probability by d will then give the expected number of days. For example, the expected number of days which are shared; i.e. which are at least two (i.e. not zero and not one) people's birthday is:
Number of people who repeat a birthday
[edit]The probability that the kth integer randomly chosen from [1,d] will repeat at least one previous choice equals q(k − 1; d) above. The expected total number of times a selection will repeat a previous selection as n such integers are chosen equals[19]
This can be seen to equal the number of people minus the expected number of different birthdays.
Average number of people to get at least one shared birthday
[edit]In an alternative formulation of the birthday problem, one asks the average number of people required to find a pair with the same birthday. If we consider the probability function Pr[n people have at least one shared birthday], this average is determining the mean of the distribution, as opposed to the customary formulation, which asks for the median. The problem is relevant to several hashing algorithms analyzed by Donald Knuth in his book The Art of Computer Programming. It may be shown[20][21] that if one samples uniformly, with replacement, from a population of size M, the number of trials required for the first repeated sampling of some individual has expected value n = 1 + Q(M), where
The function
has been studied by Srinivasa Ramanujan and has asymptotic expansion:
With M = 365 days in a year, the average number of people required to find a pair with the same birthday is n = 1 + Q(M) ≈ 24.61659, somewhat more than 23, the number required for a 50% chance. In the best case, two people will suffice; at worst, the maximum possible number of M + 1 = 366 people is needed; but on average, only 25 people are required
An analysis using indicator random variables can provide a simpler but approximate analysis of this problem.[22] For each pair (i, j) for k people in a room, we define the indicator random variable Xij, for , by
Let X be a random variable counting the pairs of individuals with the same birthday.
For n = 365, if k = 28, the expected number of pairs of individuals with the same birthday is 28 × 27/2 × 365 ≈ 1.0356. Therefore, we can expect at least one matching pair with at least 28 people.
In the 2014 FIFA World Cup, each of the 32 squads had 23 players. An analysis of the official squad lists suggested that 16 squads had pairs of players sharing birthdays, and of these 5 squads had two pairs: Argentina, France, Iran, South Korea and Switzerland each had two pairs, and Australia, Bosnia and Herzegovina, Brazil, Cameroon, Colombia, Honduras, Netherlands, Nigeria, Russia, Spain and USA each with one pair.[23]
Voracek, Tran and Formann showed that the majority of people markedly overestimate the number of people that is necessary to achieve a given probability of people having the same birthday, and markedly underestimate the probability of people having the same birthday when a specific sample size is given.[24] Further results showed that psychology students and women did better on the task than casino visitors/personnel or men, but were less confident about their estimates.
Reverse problem
[edit]The reverse problem is to find, for a fixed probability p, the greatest n for which the probability p(n) is smaller than the given p, or the smallest n for which the probability p(n) is greater than the given p.[citation needed]
Taking the above formula for d = 365, one has
The following table gives some sample calculations.
p n n↓ p(n↓) n↑ p(n↑) 0.01 0.14178√365 = 2.70864 2 0.00274 3 0.00820 0.05 0.32029√365 = 6.11916 6 0.04046 7 0.05624 0.1 0.45904√365 = 8.77002 8 0.07434 9 0.09462 0.2 0.66805√365 = 12.76302 12 0.16702 13 0.19441 0.3 0.84460√365 = 16.13607 16 0.28360 17 0.31501 0.5 1.17741√365 = 22.49439 22 0.47570 23 0.50730 0.7 1.55176√365 = 29.64625 29 0.68097 30 0.70632 0.8 1.79412√365 = 34.27666 34 0.79532 35 0.81438 0.9 2.14597√365 = 40.99862 40 0.89123 41 0.90315 0.95 2.44775√365 = 46.76414 46 0.94825 47 0.95477 0.99 3.03485√365 = 57.98081 57 0.99012 58 0.99166
Some values falling outside the bounds have been colored to show that the approximation is not always exact.
Partition problem
[edit]A related problem is the partition problem, a variant of the knapsack problem from operations research. Some weights are put on a balance scale; each weight is an integer number of grams randomly chosen between one gram and one million grams (one tonne). The question is whether one can usually (that is, with probability close to 1) transfer the weights between the left and right arms to balance the scale. (In case the sum of all the weights is an odd number of grams, a discrepancy of one gram is allowed.) If there are only two or three weights, the answer is very clearly no; although there are some combinations which work, the majority of randomly selected combinations of three weights do not. If there are very many weights, the answer is clearly yes. The question is, how many are just sufficient? That is, what is the number of weights such that it is equally likely for it to be possible to balance them as it is to be impossible?
Often, people's intuition is that the answer is above 100000. Most people's intuition is that it is in the thousands or tens of thousands, while others feel it should at least be in the hundreds. The correct answer is 23.[citation needed]
The reason is that the correct comparison is to the number of partitions of the weights into left and right. There are 2N − 1 different partitions for N weights, and the left sum minus the right sum can be thought of as a new random quantity for each partition. The distribution of the sum of weights is approximately Gaussian, with a peak at 500000N and width 1000000√N, so that when 2N − 1 is approximately equal to 1000000√N the transition occurs. 223 − 1 is about 4 million, while the width of the distribution is only 5 million.[25]
In fiction
[edit]Arthur C. Clarke's 1961 novel A Fall of Moondust contains a section where the main characters, trapped underground for an indefinite amount of time, are celebrating a birthday and find themselves discussing the validity of the birthday problem. As stated by a physicist passenger: "If you have a group of more than twenty-four people, the odds are better than even that two of them have the same birthday." Eventually, out of 22 present, it is revealed that two characters share the same birthday, May 23.
Notes
[edit]- ^ In his autobiography, Halmos criticized the form in which the birthday paradox is often presented, in terms of numerical computation. He believed that it should be used as an example in the use of more abstract mathematical concepts. He wrote:
The reasoning is based on important tools that all students of mathematics should have ready access to. The birthday problem used to be a splendid illustration of the advantages of pure thought over mechanical manipulation; the inequalities can be obtained in a minute or two, whereas the multiplications would take much longer, and be much more subject to error, whether the instrument is a pencil or an old-fashioned desk computer. What calculators do not yield is understanding, or mathematical facility, or a solid basis for more advanced, generalized theories.
References
[edit]- ^ David Singmaster, Sources in Recreational Mathematics: An Annotated Bibliography, Eighth Preliminary Edition, 2004, section 8.B
- ^ H.S.M. Coxeter, "Mathematical Recreations and Essays, 11th edition", 1940, p 45, as reported in I. J. Good, Probability and the weighing of evidence, 1950, p. 38
- ^ Richard Von Mises, "Über Aufteilungs- und Besetzungswahrscheinlichkeiten", Revue de la faculté des sciences de l'Université d'Istanbul 4:145-163, 1939, reprinted in Frank, P.; Goldstein, S.; Kac, M.; Prager, W.; Szegö, G.; Birkhoff, G., eds. (1964). Selected Papers of Richard von Mises. Vol. 2. Providence, Rhode Island: Amer. Math. Soc. pp. 313–334.
- ^ see Birthday#Distribution through the year
- ^ (Bloom 1973)
- ^ Steele, J. Michael (2004). The Cauchy‑Schwarz Master Class. Cambridge: Cambridge University Press. pp. 206, 277. ISBN 9780521546775.
- ^ Mario Cortina Borja; John Haigh (September 2007). "The Birthday Problem". Significance. 4 (3). Royal Statistical Society: 124–127. doi:10.1111/j.1740-9713.2007.00246.x.
- ^ Mathis, Frank H. (June 1991). "A Generalized Birthday Problem". SIAM Review. 33 (2): 265–270. doi:10.1137/1033051. ISSN 0036-1445. JSTOR 2031144. OCLC 37699182.
- ^ Jim Gray, Catharine van Ingen. Empirical Measurements of Disk Failure Rates and Error Rates
- ^ D. Brink, A (probably) exact solution to the Birthday Problem, Ramanujan Journal, 2012, [1].
- ^ Brink 2012, Theorem 2
- ^ a b Brink 2012, Theorem 3
- ^ a b Brink 2012, Table 3, Conjecture 1
- ^ "Minimal number of people to give a 50% probability of having at least n coincident birthdays in one year". The On-line Encyclopedia of Integer Sequences. OEIS. Retrieved 17 February 2020.
- ^ Suzuki, K.; Tonien, D.; et al. (2006). "Birthday Paradox for Multi-collisions". In Rhee M.S., Lee B. (ed.). Lecture Notes in Computer Science, vol 4296. Berlin: Springer. doi:10.1007/11927587_5. Information Security and Cryptology – ICISC 2006.
- ^ Z. E. Schnabel (1938) The Estimation of the Total Fish Population of a Lake, American Mathematical Monthly 45, 348–352.
- ^ M. C. Wendl (2003) Collision Probability Between Sets of Random Variables, Statistics and Probability Letters 64(3), 249–254.
- ^ a b M. Abramson and W. O. J. Moser (1970) More Birthday Surprises, American Mathematical Monthly 77, 856–858
- ^ Might, Matt. "Collision hash collisions with the birthday paradox". Matt Might's blog. Retrieved 17 July 2015.
- ^ Knuth, D. E. (1973). The Art of Computer Programming. Vol. 3, Sorting and Searching. Reading, Massachusetts: Addison-Wesley. ISBN 978-0-201-03803-3.
- ^ Flajolet, P.; Grabner, P. J.; Kirschenhofer, P.; Prodinger, H. (1995). "On Ramanujan's Q-Function". Journal of Computational and Applied Mathematics. 58: 103–116. doi:10.1016/0377-0427(93)E0258-N.
- ^ Cormen; et al. Introduction to Algorithms.
- ^ Fletcher, James (16 June 2014). "The birthday paradox at the World Cup". bbc.com. BBC. Retrieved 27 August 2015.
- ^ Voracek, M.; Tran, U. S.; Formann, A. K. (2008). "Birthday and birthmate problems: Misconceptions of probability among psychology undergraduates and casino visitors and personnel". Perceptual and Motor Skills. 106 (1): 91–103. doi:10.2466/pms.106.1.91-103. PMID 18459359. S2CID 22046399.
- ^ Borgs, C.; Chayes, J.; Pittel, B. (2001). "Phase Transition and Finite Size Scaling in the Integer Partition Problem". Random Structures and Algorithms. 19 (3–4): 247–288. doi:10.1002/rsa.10004. S2CID 6819493.
Bibliography
[edit]- Abramson, M.; Moser, W. O. J. (1970). "More Birthday Surprises". American Mathematical Monthly. 77 (8): 856–858. doi:10.2307/2317022. JSTOR 2317022.
- Bloom, D. (1973). "A Birthday Problem". American Mathematical Monthly. 80 (10): 1141–1142. doi:10.2307/2318556. JSTOR 2318556.
- Kemeny, John G.; Snell, J. Laurie; Thompson, Gerald (1957). Introduction to Finite Mathematics (First ed.).
- McKinney, E. H. (1966). "Generalized Birthday Problem". American Mathematical Monthly. 73 (5): 385–387. doi:10.2307/2315408. JSTOR 2315408.
- Mosteller, F. (1962). "Understanding the Birthday Problem". The Mathematics Teacher. 55 (5): 322–325. doi:10.5951/MT.55.5.0322. JSTOR 27956609. Reprinted in Mosteller, Frederick (2006). "Understanding the Birthday Problem". Selected Papers of Frederick Mosteller. Springer Series in Statistics. pp. 349–353. doi:10.1007/978-0-387-44956-2_21. ISBN 978-0-387-20271-6.
- Schneps, Leila; Colmez, Coralie (2013). "Math error number 5. The case of Diana Sylvester: cold hit analysis". Math on Trial. How Numbers Get Used and Abused in the Courtroom. Basic Books. ISBN 978-0-465-03292-1.
- Sy M. Blinder (2013). Guide to Essential Math: A Review for Physics, Chemistry and Engineering Students. Elsevier. pp. 5–6. ISBN 978-0-12-407163-6.
External links
[edit]- The Birthday Paradox accounting for leap year birthdays
- Weisstein, Eric W. "Birthday Problem". MathWorld.
- A humorous article explaining the paradox
- SOCR EduMaterials activities birthday experiment
- Understanding the Birthday Problem (Better Explained)
- Eurobirthdays 2012. A birthday problem. A practical football example of the birthday paradox.
- Grime, James. "23: Birthday Probability". Numberphile. Brady Haran. Archived from the original on 2017-02-25. Retrieved 2013-04-02.
- Computing the probabilities of the Birthday Problem at WolframAlpha