Java copy-on-write collections

Along with ConcurrentHashMap, Java 5 introduces a copy-on-write implementation of a list: CopyOnWriteArrayList. It also introduces CopyOnWriteArraySet, essentially a convenience wrapper around the list.


An instance of CopyOnWriteArrayList behaves as a List implementation that allows multiple concurrent reads, and for reads to occur concurrently with a write. The way it does this is to make a brand new copy of the list every time it is altered.

During a write operation, the array must be locked completely against other writes. (The standard implementation uses a ReentrantLock.) However, this does mean that as mentioned, operations affecting multiple locations can be atomic. That is, if one thread adds several items to the list with addAll() while another thread calls size(), the thread reading the size will get a value that either reflects or not the number of elements added in addAll(): there'll be no possibility of an intermediate value returned (provided of course that these are the only two threads accessing the list!).

CopyOnWriteArrayList is designed for cases where:

The first two are fairly obvious: if too great, then either the size of the array or the frequency of writes can outweigh the benefit of not synchrozing.

The last point is possibly more subtle, but what it boils down to is this: if you only need the functionality of an array (that is, you don't need the list to grow arbitrarily, you don't need inserts etc), then the atomic array classes generally give better throughput. To illustrate this, the following graph shows overall throughput as we increase the number of reader threads concurrently reading from random positions in a shared 20-element array of integers. (There is always one writer thread, which continually writes to random positions.) Each coloured line represents the test run on an array synchronized in a different way. The "AtomicTest" array was an AtomicIntegerArray. The graph shows that throughput is generally a little better with the atomic array (after all, it is being used for what it was designed for).

On the other hand, what this graph also shows is that CopyOnWriteArrayList still gives pretty good read throughput, and beats hands down any other method that locks the array on every read.


Another class, CopyOnWriteArraySet is build on top of CopyOnWriteArrayList. Like its list counterpart, it is designed for cases where the set contains only a few elements and where reads greatly outnumber writes. Note that the performance characteristics differ from HashSet in a couple of ways:

The second point comes from the fact that CopyOnWriteArraySet relies on the putIfAbsent() method provided by CopyOnWriteArrayList. This, as an atomic operation, checks whether an equal element is already in the list and adds it only if it is not. This method, and thus the set implementation, has the subtle characteristic that a copy of the array is made (and then thrown away) even if the item isn't added. Of course, that doesn't make the already-present case any more expensive than the not-present case, but it means that the already-present case isn't generally much cheaper either. (Both cases also perform similarly with HashSet, but with the latter class, both are generally much less expensive since only a tiny subset of the set is scanned when inserting or retrieving.)