ConcurrentHashMap: usage and functionality

On the previous page, we saw how the ConcurrentHashMap offers a means of improving concurrency beyond that of normal hash maps. In many cases, ConcurrentHashMap can be used as a drop-in replacement for a synchronized HashMap, and offers a means of avoiding synchronization in the traditional sense. (A couple of subtle differences are that ConcurrentHashMap will generally take up more memory, and that it cannot take null as a key.) Let's consider a web server that counts the number of instances of particular queries. We'll hold a map of query strings to integers and define an incrementCount() method which we can call at the moment of serving a particular query:

public final class MyServlet extends MyAbstractServlet {
  private final Map<String,Integer> queryCounts =
    Collections.synchronizedMap(new HashMap<String,Integer>(1000));

  private void incrementCount(String q) {
    Integer cnt = queryCounts.get(q);
    if (cnt == null) {
      queryCounts.put(q, 1);
    } else {
      queryCounts.put(q, cnt + 1);
    }
  }
}

In this example, we're using a plain old HashMap wrapped up in a synchronization wrapper. Recall that wrapping the map with Collections.synchronizedMap(...) makes it safe to access the map concurrently: each call to get(), put(), size(), containsKey() etc will synchronize on the map during the call. (One problem that we'll see in a minute is that iterating over the map does still require explicit synchronization.)

Note that this doesn't make incrementCount() atomic, but it does make it safe. That is, concurrent calls to incrementCount() will never leave the map in a corrupted state. But they might 'miss a count' from time to time. For example, two threads could concurrently read a current value of, say, 2 for a particular query, both independently increment it to 3, and both set it to 3, when in fact two queries have been made. Generally in the context of counting queries, we'd probably live with this: it's quite unlikely that two clients are making the selfsame query at exactly the same time, and even if they were, we wouldn't really care about missing the odd count here and there in order to improve performance.

In this example, we can improve concurrency in a single line by replacing our synchronized hash map with a ConcurrentHashMap:

  private final Map<String,Integer> queryCounts =
    new ConcurrentHashMap<String,Integer>(1000);

Note that our incrementCount() will still have the same semantics: that is, it will never leave the map in an inconsistent state, but it could still miss a count in an unlucky case.

Truly atomic updates

So what if we want truly atomic updates: that is, to make incrementCount() never miss a count? To do this with a traditional HashMap, we could synchronize on the map during the entire incrementCount() method, with a potential impact on throughput. With ConcurrentHashMap, we can take advantage of its concurrent update facility. ConcurrentHashMap implements the following interface:

public interface ConcurrentMap<K, V> extends Map<K, V> {
  V putIfAbsent(K key, V value);
  boolean remove(Object key, Object value);
  boolean replace(K key, V oldValue, V newValue);
  V replace(K key, V value);
}

In our case, the interesting methods are the replace() methods, which are effectively compare-and-set operations for a map. So we can implement our incrementCount() method as follows. Note that we do now need to change the signature of our queryCounts map and declare it as a ConcurrentMap:

public final class MyServlet extends MyAbstractServlet {
  private final ConcurrentMap<String,Integer> queryCounts =
    new ConcurrentHashMap<String,Integer>(1000);

  private void incrementCount(String q) {
    Integer oldVal, newVal;
    do {
      oldVal = queryCounts.get(q);
      newVal = (oldVal == null) ? 1 : (oldVal + 1);
    } while (!queryCounts.replace(q, oldVal, newVal));
  }
}

This code is very similar to the code to update an AtomicInteger: we read the current value of the count, calculate the new count, and then say to the ConcurrentHashMap: "please map this key to this new value, if and only if the previously mapped value was this". If the call returns false to say that we were wrong about the previously mapped value, indicating in effect that another thread has "snuck in", then we simply loop round and try again. As with AtomicInteger updates, this is very efficient because we rarely expect another thread to sneak in, and where it does, we can keep hold of the CPU rather than having to sleep while the other thread releases the lock.

Iterating over the map

In our case of counting web queries so far, you may be wondering "what's the big deal"? Of course, there is the argument that on a busy server, anything that helps improve throughput is a big deal. But in this case, most of the operations on the map are very quick and occur only once per query, so the map won't be highly contended. In this case, a bigger benefit comes when we want to iterate over the map.

If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants. Follow @BitterCoffey