ConcurrentHashMap
The Java ConcurrentHashMap is version of the standard Java HashMap that is optimised
for efficient access by multiple threads at the same time. Maps—
also referred to as dictionaries— are an important data structure used in many applications where we need to associate
one piece of data with another. We usually talk about associating keys with values. This type of association
crops up in a range of cases such as:
- Caches and lookups: for example, after reading the contents of
a given file or database table, we could associate the file name with
its contents and hold it in a HashMap representing an in-memory cache (similarly, we could associate a database key with a representation of the row data,
associate a web server session ID with user data...);
- Dictionaires: for example, we could associate
locale abbrevations with a language name;
- Sparse arrays: by mapping integers to values, we
in effect create an array which does not waste space on blank elements.
Many of these cases are precisely the types of data that we need to work with, for example, in a multi-threaded server application.
Ordinarily, using a HashMap is not thread-safe: if multiple threads attempt to
access the same map simultaneously, there is a risk that the map will be come corrupted or that threads will not read the correct data.
The Java synchronized keyword provides a means to add thread-safety.
But under high contention, using synchronized is potentially inefficient. And as discussed, maps are an extremely
useful data structure that we frequently want to use in applications where performance is a concern.
How ConcurrentHashMap works to overcome synchronization overhead
The ConcurrentHashMap improves concurrent performance by taking advantage of
how HashMaps store their data: the data is distributed into different "buckets"
in memory. When we call put() or get() on a ConcurrentHashMap, the map therefore only needs to be temporarily
locked on the specific bucket of data being accessed, rather than on the whole map. (Whereas if we synchronized on the
get() and put() methods, we would lock on the entire map and hence reduce throughput.) In fact,
the ConcurrentHashMap also takes advantage of Java ReadWriteLocks
so that:
- writing to a ConcurrentHashMap locks only a portion of the map;
- reads can generally occur without locking.
When to use ConcurrentHashMap
ConcurrentHashMap is recommended instead of a standard HashMap whenever the map will be modified by multiple
threads:
- it provides improved throughput compared to synchronized;
- it allows concurrent reads modifications to be performed safely;
- it provides atomic operations for query-then-update ("get-set") operations;
- it provides memory visibility guarantees
(data added to the map by one thread
is "safely published" to other threads).
In general, there is little downside to using ConcurrentHashMap other than that in general it will consume
more memory than the equivalent standard HashMap.
Next: throughput and scalability of ConcurrentHashMap vs synchronized HashMap
The benefits of ConcurrentHashMap over a regular synchronized
HashMap become blatantly apparent when we run a small experiment
to simulate what might happen in the case of a map used as a frequently-accessed
cache on a moderately busy server.
On the next page, we discuss the scalability
of ConcurrentHashMap in the light of such a simulation.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.