Advanced use of hash codes in Java

In this section, I want to go beyond the standard Java collections framework for some fairly specialised uses. In most cases, the standard collections will serve you extremely well. In particular, HashMap is a very versatile class, and its Java 5 brother ConcurrentHashMap provides an easy-to-use, high-concurrency data structure that suits many uses.

However, the 32-bit hash codes used in Java collections such as HashMap, HashSet etc do have some limitations. A key limitation is that in the standard Java implementation, hash codes do not uniquely identify an object. They simply narrow down the choice of matching items, but it is expected that in normal use, there is a good chance that several objects will share the same hash code. When looking for a key in a map or set, the fields of the actual key object must therefore be compared to confirm a match.

In some cases, we need to store the keys in a hash map or the items in a set. For example, if we want to be able to iterate over the keys or objects, then keeping them in the hash map or set is pretty much the only option. But if all we require are lookups and testing for the presence of those keys or objects, then in some cases we can rely on just the hash codes without storing the corresponding keys or objects in the map or set.

As we'll see over the next few pages, the main principle for this to work is to use stronger hash codes which not only narrow down the choice of keys or objects, but can uniquely identify them to all intents and purposes. This has a few implications:

To understand how these works, it's first worth understanding a few brief statistics about hash codes.