Strong and secure hash functions

In its broadest sense, a hash function takes some data and turns it into a hash code which in some sense "represents" that data. A common use that we looked at is creating a hash map (or hash table), whereby the hash code allows us to efficiently "narrow down the search" for an item. Java's hash maps use 32-bit hash functions. This is what we might call a weak hash function:

when hashing a moderate number of items, it is collisions are expected: that is, in the normal course of events, multiple items will resolve to the same hash code and we must deal with this case (in the case of HashMap, we do this by storing and comparing the actual object once we get a match on hash code);
relatedly, it is trivial to find two objects "deliberately" that have the same hash code.

In this section we will look at two other categories of hash code:

we'll say that a strong hash code is one where in typical use, accidentally stumbling upon items that share the same hash code is so unlikely that we can assume it won't happen for our purposes;
a secure hash code is one where we can't locate two items with the same hash code, even if we deliberately try to (not because such collisions don't potentially exist, but just because it's too much effort to find any).

Secure hash codes are also strong hash codes, though not necessarily vice versa. In reality, the notion of "strength" is a continuum rather than a distinct category, and "security" refers to the upper end of this continuum. The strength and security of a hash code are dictated by a combination of its width (the number of bits that a hash code has), and by its quality: the ability of the function to distribute bits "randomly". But even if we can't find patterns in the hash codes that would allow us to calculate two items with identical hash codes, if the number of possible hash codes is small enough, we can just try hashing a huge number of items until we find a collision by trial and error. Thus, for a hash function to be secure generally implies that it is wide enough that finding a collision by trial and error is not feasible.

If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants. Follow @BitterCoffey