Advanced use of hash codes in Java

In this section, I want to go beyond the standard Java collections framework for some fairly specialised uses. In most cases, the standard collections will serve you extremely well. In particular, HashMap is a very versatile class, and its Java 5 brother ConcurrentHashMap provides an easy-to-use, high-concurrency data structure that suits many uses.

Hash Collision Calculatorgreat free widgetsWidgetbox

However, the 32-bit hash codes used in Java collections such as HashMap, HashSet etc do have some limitations. A key limitation is that in the standard Java implementation, hash codes do not uniquely identify an object. They simply narrow down the choice of matching items, but it is expected that in normal use, there is a good chance that several objects will share the same hash code. When looking for a key in a map or set, the fields of the actual key object must therefore be compared to confirm a match.

In some cases, we need to store the keys in a hash map or the items in a set. For example, if we want to be able to iterate over the keys or objects, then keeping them in the hash map or set is pretty much the only option. But if all we require are lookups and testing for the presence of those keys or objects, then in some cases we can rely on just the hash codes without storing the corresponding keys or objects in the map or set.

As we'll see over the next few pages, the main principle for this to work is to use stronger hash codes which not only narrow down the choice of keys or objects, but can uniquely identify them to all intents and purposes. This has a few implications:

we can create hash map structures which do not need to store the key objects, but can rely on the (stronger) hash code alone: we will look at this principle in more detail over the next few pages, starting with an example of a strong, 64-bit hash function;
we can implement a much more compact hash set as only the hash codes need to be stored and not the objects themselves;
if we are able to accept a given error rate, we can actually implement highly compact sets using just hash codes.

To understand how these works, it's first worth understanding a few brief statistics about hash codes.

If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants. Follow @BitterCoffey