Strong and secure hash functions
In its broadest sense, a hash function takes some data and turns it into a
hash code which in some sense "represents" that data. A common use that we looked at is creating a
hash map (or hash table), whereby the hash code
allows us to efficiently "narrow down the search" for an item. Java's hash maps use
32-bit hash functions. This is what we might call a weak hash function:
- when hashing a moderate number of items, it is collisions are
expected: that is, in the normal course of events, multiple items will resolve
to the same hash code and we must deal with this case (in the case of HashMap,
we do this by storing and comparing the actual object once we get a match on hash code);
- relatedly, it is trivial to find two objects "deliberately" that
have the same hash code.
In this section we will look at two other categories of hash code:
- we'll say that a strong hash code is one where in
typical use, accidentally stumbling upon items that share the same hash code
is so unlikely that we can assume it won't happen for our purposes;
- a secure hash code is one where we can't locate two
items with the same hash code, even if we deliberately try to (not because such collisions
don't potentially exist, but just because it's too much effort to find any).
Secure hash codes are also strong hash codes, though not necessarily vice versa.
In reality, the notion of "strength" is a continuum rather than a distinct category,
and "security" refers to the upper end of this continuum.
The strength and security of a hash code are dictated by a combination of its width
(the number of bits that a hash code has), and by its quality: the
ability of the function to distribute bits "randomly". But even if we can't find patterns
in the hash codes that would allow us to calculate two items with identical
hash codes, if the number of possible hash codes is small enough, we can just try hashing
a huge number of items until we find a collision by trial and error. Thus, for a hash
function to be secure generally implies that it is wide enough that finding a
collision by trial and error is not feasible.