A strong 64-bit hash function in Java (ctd)

(Continued from strong hash code implementation.)

Again, any method for producing 256 medium-quality random numbers would generally suffice. The Numerical Recipes authors suggest a method based on an XORShift generator. They start with a number with roughly half its bits set, then use this to seed an XORShift generator, picking every 32nd number from the series to fill the next element in the table. An equivalent implementation in Java would be as follows:

private static final createLookupTable() {
  byteTable = new long[256];
  long h = 0x544B2FBACAAF1684L;
  for (int i = 0; i < 256; i++) {
    for (int j = 0; j < 31; j++) {
      h = (h >>> 7) ^ h;
      h = (h << 11) ^ h;
      h = (h >>> 10) ^ h;
    byteTable[i] = h;
  return byteTable;

Using this scheme, we can produce a hash code for any object provided that we can produce a byte representation of that object. A common object type to key on in a hash table would be a String. One option for running the hash function on a string would be to call its getBytes() method and then use the method we have already created. Alternatively, we may wish to hard-code a string— or indeed any CharSequence— as a special case as follows:

public static long hash(CharSequence cs) {
  long h = HSTART;
  final long hmult = HMULT;
  final long[] ht = byteTable;
  final int len = cs.length();
  for (int i = 0; i < len; i++) {
    char ch = cs.charAt(i);
    h = (h * hmult) ^ ht[ch & 0xff];
    h = (h * hmult) ^ ht[(ch >>> 8) & 0xff];
  return h;

How to convert other objects to a byte array in order to run the hash function

For other types of objects, a possible strategy is to use a NIO buffer to convert the fields of the object into a byte array. In a similar way to the CharSequence hash function shown above, we could also effectively "hard code" how bytes are pulled out of the field values. However, this is likely to be more complicated and error-prone.

Next: using 64-bit hash codes to create a hash table

Having seen how to implement the hash function itself, we now look at how to take this and implement a hash table that stores only the hashes of the keys.