However, the 32-bit hash codes used in Java collections such as HashMap, HashSet etc do
have some limitations. A key limitation is that in the standard Java implementation,
hash codes do not uniquely identify an object. They simply narrow down the choice of matching
items, but it is expected that in normal use, there is a good chance that several objects will
share the same hash code. When looking for a key in a map or set, the fields of the actual
key object must therefore be compared to confirm a match.
In some cases, we need to store the keys in a hash map or the items in a set.
For example, if we want to be able to iterate over the keys or objects, then
keeping them in the hash map or set is pretty much the only option. But if all we require are
lookups and testing for the presence of those keys or objects, then in some cases
we can rely on just the hash codes without storing the corresponding keys or objects
in the map or set.
As we'll see over the next few pages, the main principle for this to work is to use stronger hash codes which not only
narrow down the choice of keys or objects, but can uniquely identify them to
all intents and purposes. This has a few implications:
- we can create hash map structures which do not need to store the key objects,
but can rely on the (stronger) hash code alone: we will look at this principle in
more detail over the next few pages, starting with an example of a
strong, 64-bit hash function;
- we can implement a much more compact hash set as only the hash codes need to be stored
and not the objects themselves;
- if we are able to accept a given error rate, we can actually implement highly
compact sets using just hash codes.
To understand how these works, it's first worth understanding a few brief statistics about hash codes.