Birthday attack

A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties. The attack depends on the higher likelihood of collisions found between random attack attempts and a fixed degree of permutations (pigeonholes), as described in the birthday problem/paradox.

Understanding the problem

As an example, consider the scenario in which a teacher with a class of 30 students asks for everybody's birthday, to determine whether any two students have the same birthday (corresponding to a hash collision as described below; for simplicity, ignore February 29). Intuitively, this chance may seem small. If the teacher picked a specific day (say September 16), then the chance that at least one student was born on that specific day is $1-(364/365)^{30}$ , about 7.9%. However, the probability that at least one student has the same birthday as any other student is around 70% (using the formula $1-365!/((365-n)!\cdot 365^{n})$ for n = 30^[1]).

Mathematics

Given a function $f$ , the goal of the attack is to find two different inputs $x_{1},x_{2}$ such that $f(x_{1})=f(x_{2})$ . Such a pair $x_{1},x_{2}$ is called a collision. The method used to find a collision is simply to evaluate the function $f$ for different input values that may be chosen randomly or pseudorandomly until the same result is found more than once. Because of the birthday problem, this method can be rather efficient. Specifically, if a function $f(x)$ yields any of $H$ different outputs with equal probability and $H$ is sufficiently large, then we expect to obtain a pair of different arguments $x_{1}$ and $x_{2}$ with $f(x_{1})=f(x_{2})$ after evaluating the function for about $1.25{\sqrt {H}}$ different arguments on average.

We consider the following experiment. From a set of H values we choose n values uniformly at random thereby allowing repetitions. Let p(n; H) be the probability that during this experiment at least one value is chosen more than once. This probability can be approximated as

p(n;H)\approx 1-e^{-n(n-1)/(2H)}\approx 1-e^{-n^{2}/(2H)},\,

Let n(p; H) be the smallest number of values we have to choose, such that the probability for finding a collision is at least p. By inverting this expression above, we find the following approximation

n(p;H)\approx {\sqrt {2H\ln {\frac {1}{1-p}}}},

and assigning a 0.5 probability of collision we arrive at

n(0.5;H)\approx 1.1774{\sqrt {H}}.\,

Let Q(H) be the expected number of values we have to choose before finding the first collision. This number can be approximated by

Q(H)\approx {\sqrt {{\frac {\pi }{2}}H}}.

As an example, if a 64-bit hash is used, there are approximately 1.8 × 10¹⁹ different outputs. If these are all equally probable (the best case), then it would take 'only' approximately 5.1 × 10⁹ attempts to generate a collision using brute force. This value is called birthday bound^[2] and for n-bit codes it could be computed as 2^n/2.^[3] Other examples are as follows:

Bits	Possible outputs (rounded)(H)	Desired probability of random collision (rounded) (p)
Bits	Possible outputs (rounded)(H)	10⁻¹⁸	10⁻¹⁵	10⁻¹²	10⁻⁹	10⁻⁶	0.1%	1%	25%	50%	75%
16	6.6 × 10⁴	2	2	2	2	2	11	36	1.9 × 10²	3.0 × 10²	4.3 × 10²
32	4.3 × 10⁹	2	2	2	2.9	93	2.9 × 10³	9.3 × 10³	5.0 × 10⁴	7.7 × 10⁴	1.1 × 10⁵
64	1.8 × 10¹⁹	6.1	1.9 × 10²	6.1 × 10³	1.9 × 10⁵	6.1 × 10⁶	1.9 × 10⁸	6.1 × 10⁸	3.3 × 10⁹	5.1 × 10⁹	7.2 × 10⁹
128	3.4 × 10³⁸	2.6 × 10¹⁰	8.2 × 10¹¹	2.6 × 10¹³	8.2 × 10¹⁴	2.6 × 10¹⁶	8.3 × 10¹⁷	2.6 × 10¹⁸	1.4 × 10¹⁹	2.2 × 10¹⁹	3.1 × 10¹⁹
256	1.2 × 10⁷⁷	4.8 × 10²⁹	1.5 × 10³¹	4.8 × 10³²	1.5 × 10³⁴	4.8 × 10³⁵	1.5 × 10³⁷	4.8 × 10³⁷	2.6 × 10³⁸	4.0 × 10³⁸	5.7 × 10³⁸
384	3.9 × 10¹¹⁵	8.9 × 10⁴⁸	2.8 × 10⁵⁰	8.9 × 10⁵¹	2.8 × 10⁵³	8.9 × 10⁵⁴	2.8 × 10⁵⁶	8.9 × 10⁵⁶	4.8 × 10⁵⁷	7.4 × 10⁵⁷	1.0 × 10⁵⁸
512	1.3 × 10¹⁵⁴	1.6 × 10⁶⁸	5.2 × 10⁶⁹	1.6 × 10⁷¹	5.2 × 10⁷²	1.6 × 10⁷⁴	5.2 × 10⁷⁵	1.6 × 10⁷⁶	8.8 × 10⁷⁶	1.4 × 10⁷⁷	1.9 × 10⁷⁷

Table shows number of hashes n(p) needed to achieve the given probability of success, assuming all hashes are equally likely. For comparison, 10⁻¹⁸ to 10⁻¹⁵ is the uncorrectable bit error rate of a typical hard disk [1]. In theory, MD5 hashes or UUIDs, being 128 bits, should stay within that range until about 820 billion documents, even if its possible outputs are many more.

It is easy to see that if the outputs of the function are distributed unevenly, then a collision can be found even faster. The notion of 'balance' of a hash function quantifies the resistance of the function to birthday attacks and allows the vulnerability of popular hashes such as MD and SHA to be estimated (Bellare and Kohno, 2004).

The subexpression $\ln {\frac {1}{1-p}}$ in the equation for $n(p;H)$ is not computed accurately for small $p$ when directly translated into common programming languages as log(1/(1-p)) due to loss of significance. When log1p is available (as it is in ANSI C), the equivalent expression -log1p(-p) should be used instead.^[4] When this is not done, the first column of the above table is computed as zero, and several items in the second column do not have even one correct significant digit.

Digital signature susceptibility

Digital signatures can be susceptible to a birthday attack. A message  $m$  is typically signed by first computing  $f(m)$ , where  $f$  is a cryptographic hash function, and then using some secret key to sign  $f(m)$ . Suppose Mallory wants to trick Bob into signing a fraudulent contract. Mallory prepares a fair contract  $m$  and a fraudulent one  $m'$ . She then finds a number of positions where  $m$  can be changed without changing the meaning, such as inserting commas, empty lines, one versus two spaces after a sentence, replacing synonyms, etc. By combining these changes, she can create a huge number of variations on  $m$  which are all fair contracts.
In a similar manner, Mallory also creates a huge number of variations on the fraudulent contract  $m'$ . She then applies the hash function to all these variations until she finds a version of the fair contract and a version of the fraudulent contract which have the same hash value,  $f(m)=f(m')$ . She presents the fair version to Bob for signing. After Bob has signed, Mallory takes the signature and attaches it to the fraudulent contract. This signature then "proves" that Bob signed the fraudulent contract. 
The probabilities differ slightly from the original birthday problem, as Mallory gains nothing by finding two fair or two fraudulent contracts with the same hash. Mallory's strategy is to generate pairs of one fair and one fraudulent contract. The birthday problem equations apply where  $n$  is the number of pairs. The number of hashes Mallory actually generates is  $2n$ .
To avoid this attack, the output length of the hash function used for a signature scheme can be chosen large enough so that the birthday attack becomes computationally infeasible, i.e. about twice as many bits as are needed to prevent an ordinary brute-force attack.
Pollard's rho algorithm for logarithms is an example for an algorithm using a birthday attack for the computation of discrete logarithms.
Birthday attacks are often discussed as a potential weakness of the Internet's  domain name service system.^[5]

See also

Collision attack
Meet-in-the-middle attack

Notes



^ "Math Forum: Ask Dr. Math FAQ: The Birthday Problem".

^ See upper and lower bounds.

^ Jacques Patarin, Audrey Montreuil (2005). "Benes and Butterfly schemes revisited" (PostScript, PDF). Université de Versailles. Retrieved 2007-03-15. {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help)


^ "Compute log(1+x) accurately for small values of x".

^ DNS Cache Poisoning — The Next Generation

References


Mihir Bellare, Tadayoshi Kohno: Hash Function Balance and Its Impact on Birthday Attacks. EUROCRYPT 2004: pp401–418
Applied Cryptography, 2nd ed. by Bruce Schneier

External links

"What is a digital signature and what is authentication?" from RSA Security's crypto FAQ.
"Birthday Attack" X5 Networks Crypto FAQs


v
t
e
Cryptographic hash functions and message authentication codes

List
Comparison
Known attacks
Common functions 
MD5 (compromised)
SHA-1 (compromised)
SHA-2
SHA-3
BLAKE2
SHA-3 finalists 
BLAKE
Grøstl
JH
Skein
Keccak (winner)
Other functions 
BLAKE3
CubeHash
ECOH
FSB
Fugue
GOST
HAS-160
HAVAL
Kupyna
LSH
Lane
MASH-1
MASH-2
MD2
MD4
MD6
MDC-2
N-hash
RIPEMD
RadioGatún
SIMD
SM3
SWIFFT
Shabal
Snefru
Streebog
Tiger
VSH
Whirlpool
Password hashing/
key stretching functions 
Argon2
Balloon
bcrypt
Catena
crypt
LM hash
Lyra2
Makwa
PBKDF2
scrypt
yescrypt
General purpose
key derivation functions 
HKDF
KDF1/KDF2
MAC functions 
CBC-MAC
DAA
GMAC
HMAC
NMAC
OMAC/CMAC
PMAC
Poly1305
SipHash
UMAC
VMAC
Authenticated
encryption modes 
CCM
ChaCha20-Poly1305
CWC
EAX
GCM
IAPM
OCB
Attacks 
Collision attack
Preimage attack
Birthday attack
Brute-force attack
Rainbow table
Side-channel attack
Length extension attack
Design 
Avalanche effect
Hash collision
Merkle–Damgård construction
Sponge function
HAIFA construction
Standardization 
CAESAR Competition
CRYPTREC
NESSIE
NIST hash function competition
Password Hashing Competition
NSA Suite B
CNSA
Utilization 
Hash-based cryptography
Merkle tree
Message authentication
Proof of work
Salt
Pepper
v
t
e
Cryptography
General 
History of cryptography
Outline of cryptography
Classical cipher
Cryptographic protocol
Authentication protocol
Cryptographic primitive
Cryptanalysis
Cryptocurrency
Cryptosystem
Cryptographic nonce
Cryptovirology
Hash function
Cryptographic hash function
Key derivation function
Secure Hash Algorithms
Digital signature
Kleptography
Key (cryptography)
Key exchange
Key generator
Key schedule
Key stretching
Keygen
Machines
Cryptojacking malware
Ransomware
Random number generation
Cryptographically secure pseudorandom number generator (CSPRNG)
Pseudorandom noise (PRN)
Secure channel
Insecure channel
Subliminal channel
Encryption
Decryption
End-to-end encryption
Harvest now, decrypt later
Information-theoretic security
Plaintext
Codetext
Ciphertext
Shared secret
Trapdoor function
Trusted timestamping
Key-based routing
Onion routing
Garlic routing
Kademlia
Mix network
Mathematics 
Cryptographic hash function
Block cipher
Stream cipher
Symmetric-key algorithm
Authenticated encryption
Public-key cryptography
Quantum key distribution
Quantum cryptography
Post-quantum cryptography
Message authentication code
Random numbers
Steganography

 Category

[1] "Math Forum: Ask Dr. Math FAQ: The Birthday Problem".

[2] See upper and lower bounds.

[3] Jacques Patarin, Audrey Montreuil (2005). "Benes and Butterfly schemes revisited" (PostScript, PDF). Université de Versailles. Retrieved 2007-03-15. {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help)

[4] "Compute log(1+x) accurately for small values of x".

[5] DNS Cache Poisoning — The Next Generation

[1]

[2]

[3]

[4]

[5]