A hash table in Java with 64-bit hash codes
On the previous pages, we created a 64-bit hash function.
We can now use this to create a hash map in which we will not store the key objects, but only the
hash codes of the keys and the values the corresponding keys are mapped to.
In principle, the procedure is as follows— note the structural similarity with the standard
Java HashMap implementation:
- we have a number of buckets; the bucket number is taken from the lower bits
of the hash code;
- each bucket contains a linked list of items, consisting of hash code/value pairs.
One way of implementing this would be to essentially follow the structure of the
standard Java HashMap implementation, with each bucket consisting of a list of
"entry" objects, à la HashMap.Entry,
encapsulating each (hash code, value) pair. In the example implementation
here, we will favour the approach used by the Numerical Recipes authors of essentially
"rolling our own" object allocation. The approach is as follows:
- instead of lists of "entry" objects, we have two arrays: one for the hash codes
and one for the corresponding values; elements in these two arrays sharing the same
index in effect form an entry object;
- from this array pair, we can "allocate" entries to lists by using an additional
array of "next pointers";
- finally, a fourth array forms our "bucket table", which contains the index of the
start of the list for each bucket.
In our example, we'll write a CompactHashMap class that maps
CharSequences to objects of a given type. We'll also make our class
generic:
public class CompactMap<E> {
private int[] table;
private int[] nextPtrs;
private long[] hashValues;
private E[] elements;
private int nextHashValuePos;
private int hashMask;
public CompactMap(int maxElements) {
int sz = 128;
int desiredTableSize = maxElements * 4 / 3;
while (sz < desiredTableSize) sz <<= 1;
this.table = new int[sz];
this.nextPtrs = new int[maxValues];
this.hashValues = new long[maxValues];
this.elements = (E[]) new Object[sz];
Arrays.fill(table, -1);
this.hashMask = sz-1;
}
public E get(CharSequence key) {
...
}
public E put(CharSequence key, E value) {
...
}
}
The potential advantage of structuring the data in this way is that it is
less "object heavy" than the standard Java implementation. This probably will put
less strain on the garbage collector for relatively large maps that stay around
in memory for most of the lifetime of the application (which is a typical use
of hash maps). For small, temporary maps, Java's use of small objects may actually
be beneficial, but we leave it as a future exercise to test this hypothesis.
On the next page we look at how
to implement the get()
and put() methods.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.