In choosing a given key size or hash width,
we need to consider the possibility of **collision attacks**.
Collision attacks essentially stem from the observation that if you take two sets of
random values from some range, then the size of each set need only be around the *square root*
of the number of potential values for there to be a good chance of at least one *collision*,
or value present in both groups. For example, let's say you have a random number generator
that generates numbers fairly between 1 and 65536 (=2^{16}). Now, you generate 256 numbers
and call them Group A. And then you generate another 256 numbers and call them Group B. There'll
be a greater than 60% chance that at least one number in Group A is also in Group B.

At present, collision attacks are not generally practical with 128-bit keys and hashes, but are arguably going to become practical quite soon, certainly within the required "confidentially lifetime" of many people's data.

A common variant is called a **birthday attack** or example of the
**birthday problem**. It usually refers to observing repetitions that
are likely to occur given enough applications of a particular function or algorithm.
It stems from the observation that if you pick 23 people
at random, there's a better than 50% chance that some pair will share the same birthday.

The phenomenon extends most obviously to hash functions. Let's say we
use a 128-bit MD5 hash to verify the integrity of a message. There are 2^{128}
possible hash codes. But after we have hashed "only" 2^{64} random messages
(the square root of 2^{128}), then just by chance it is more likely
than not that some pair of those messages will have the same hash code. With
SHA-1, a 160-bit hash, then a pair only becomes likely after around 2^{80}
messages. (Note that we're just talking about collisions that naturally would occur
by random chance; we're not referring to active attacks against these particular hash functions.)

A similar phenomenon occurs if we encrypt a large number of blocks using
certain block cipher modes. In OFB *(output feedback)* mode,
then after encrypting in the order of 2^{64} blocks, it's likely that we'll
re-use part of the previous cipher stream. At present, this is an inconceivably large
amount of data; in 20 or 30 years, it may well be a realistic amount of data to encrypt with
a given key.

In a meet-in-the-middle attack, Group A is typically some pre-calculated set of
encrypted blocks, and Group B is a set of "real" encrypted blocks. The idea is to
use the pre-calculated set of blocks to try and find at least *one* "real" encrypted
block that you can decrypt. A typical version of the attack is as follows:

- we
**pre-calculate**the encrypted version of some known or suspected block of plaintext (for example, some known file header, known part of a connection protocol etc) for**about 2**, where the key length is 2^{n/2}possible keys^{n}— note that this is the*square root*of the possible number of keys, and much much fewer than*half*the keys that we'd expect for a brute-force attack; - then, we sit and make lots of
*observations*: scan lots of encrypted conversations, scan lots of encrypted files etc;**after an average of 2**, we'd expect a^{n/2}observations*collision*— i.e. for one of our pre-calculated encrypted blocks to occur— with over 60% certainty (assuming our prediction of the plaintext was correct, of course); - you can trade more pre-calculated cipher texts for fewer required observations and vice versa, or just trade fewer of either for a smaller percentage chance of getting a collision.

Collision attacks are particularly deadly for DES, with its 56-bit key size and
small block size.
If you pre-calculate 2^{30} keys (about a billion, or about 8 GB of data),
then after 2^{20} observations (about a million), there is around a 1% chance
of a collision; pre-calculating 2^{32} keys gives roughly that chance after
2^{18} observations (around 200,000). These are very feasible calculations
and not an unreasonable number of encryptions in some applications, and are part of the
reason why you shouldn't entrust data of any value to DES.

At present, scaling these calculations up to 128-bit key sizes (and a 128-bit
block size) sounds infeasible. Working with 2^{62} precalculations
and 2^{62} observations gives us around a 6% chance of a collision,
which might be just about the cut-off point for "worth an expensive attack" for
some adversaries.
But storing 2^{62} blocks requires about 50 million terabytes
(and we can't compute them fast enough each time to not have to store them).
And we're talking about 4 billion billion observations. For now, this doesn't
constitute a practical attack. But how far-fetched will
these numbers appear in 10 years, 20 years, 30 years...?