ConcurrentHashMap: usage and functionality
On the previous page, we saw how the ConcurrentHashMap
offers a means of improving concurrency beyond that of normal hash maps.
In many cases, ConcurrentHashMap can be used as a drop-in
replacement for a synchronized HashMap, and offers a means of avoiding synchronization
in the traditional sense.
(A couple of subtle differences are that ConcurrentHashMap will generally take up more
memory, and that it cannot take null as a key.)
Let's consider a web server that counts the number of instances
of particular queries. We'll hold a map of query strings to integers and define an incrementCount()
method which we can call at the moment of serving a particular query:
public final class MyServlet extends MyAbstractServlet {
private final Map<String,Integer> queryCounts =
Collections.synchronizedMap(new HashMap<String,Integer>(1000));
private void incrementCount(String q) {
Integer cnt = queryCounts.get(q);
if (cnt == null) {
queryCounts.put(q, 1);
} else {
queryCounts.put(q, cnt + 1);
}
}
}
In this example, we're using a plain old HashMap wrapped up in a synchronization wrapper.
Recall that wrapping the map with Collections.synchronizedMap(...) makes it safe to access
the map concurrently: each call to get(), put(), size(),
containsKey() etc will synchronize on the map during the call. (One problem that we'll
see in a minute is that iterating over the map does still require explicit synchronization.)
Note that this doesn't make incrementCount() atomic, but
it does make it safe. That is, concurrent calls to incrementCount()
will never leave the map in a corrupted state. But they might 'miss a count' from time to time.
For example, two threads could concurrently read a current value of, say, 2 for a particular
query, both independently increment it to 3, and both set it to 3, when in fact two queries
have been made. Generally in the context of counting queries, we'd probably live with this: it's
quite unlikely that two clients are making the selfsame query at exactly the same time, and even if they
were, we wouldn't really care about missing the odd count here and there in order to
improve performance.
In this example, we can improve concurrency in a single line by replacing our synchronized
hash map with a ConcurrentHashMap:
private final Map<String,Integer> queryCounts =
new ConcurrentHashMap<String,Integer>(1000);
Note that our incrementCount() will still have the same semantics: that is,
it will never leave the map in an inconsistent state, but it could still miss a count in
an unlucky case.
Truly atomic updates
So what if we want truly atomic updates: that is, to make incrementCount()
never miss a count? To do this with a traditional HashMap, we could synchronize on the
map during the entire incrementCount() method, with a potential impact on throughput.
With ConcurrentHashMap, we can take advantage of its concurrent update facility.
ConcurrentHashMap implements the following interface:
public interface ConcurrentMap<K, V> extends Map<K, V> {
V putIfAbsent(K key, V value);
boolean remove(Object key, Object value);
boolean replace(K key, V oldValue, V newValue);
V replace(K key, V value);
}
In our case, the interesting methods are the replace() methods, which are
effectively compare-and-set operations for a map. So we can implement our incrementCount()
method as follows. Note that we do now need to change the signature of our queryCounts
map and declare it as a ConcurrentMap:
public final class MyServlet extends MyAbstractServlet {
private final ConcurrentMap<String,Integer> queryCounts =
new ConcurrentHashMap<String,Integer>(1000);
private void incrementCount(String q) {
Integer oldVal, newVal;
do {
oldVal = queryCounts.get(q);
newVal = (oldVal == null) ? 1 : (oldVal + 1);
} while (!queryCounts.replace(q, oldVal, newVal));
}
}
This code is very similar to the code to update an AtomicInteger: we read the
current value of the count, calculate the new count, and then say to the ConcurrentHashMap:
"please map this key to this new value, if and only if the previously mapped value was this". If the
call returns false to say that we were wrong about the previously mapped value, indicating
in effect that another thread has "snuck in", then we simply loop round and try again. As with
AtomicInteger updates, this is very efficient because we rarely expect another thread
to sneak in, and where it does, we can keep hold of the CPU rather than having to sleep while the
other thread releases the lock.
Iterating over the map
In our case of counting web queries so far, you may be wondering "what's the big deal"?
Of course, there is the argument that on a busy server, anything that helps improve throughput is
a big deal. But in this case, most of the operations on the map are very quick and occur only
once per query, so the map won't be highly contended. In this case, a bigger benefit comes when
we want to iterate over the map.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.