Context and process switching

Following our discussion of thread scheduling and Java, we now turn to look in more detail at the issue of context switching. Roughly speaking, this is the procedure that takes place when the system switches between threads running on the available CPUs.

Switching between threads will have some overhead:

Context switches appear to typically have a cost somewhere between 1 and 10 microseconds (i.e. between a thousandth and a hundredth of a millisecond) between the fastest and slowest cases (same-process threads with little memory contention vs different processes). So the following are acceptable:

So the worst case is generally where we have several "juggling" threads which each time they are switched in only do a tiny amount of work (but do some work, thus hitting memory and contending with one another for resources) before context switching.

What causes too many slow context switches in Java?

Every time we deliberately change a thread's status or attributes (e.g. by sleeping, waiting on an object, changing the thread's priority etc), we will cause a context switch. But usually we don't do those things so many times in a second to matter. Typically, the cause of excessive context switching comes from contention on shared resources, particularly synchronized locks:

The second case is generally worse, because the juggling threads, each time they make a tiny bit of progress, fight for shared CPU cache, thus making each other less efficient each time they're switched in.

Avoiding contention and context switches in Java

Firstly, before hacking with your code, a first course of action is upgrading your JVM, particularly if you are not yet using Java 6. Most new Java JVM releases have come with improved synchronization optimisation.

Then, a high-level solution to avoiding synchronized lock contention is generally to use the various classes from the Java 5 concurrency framework (see the java.util.concurrent package). For example, instead of using a HashMap with appropriate synchronization, a ConcurrentHashMap can easily double the throughput with 4 threads and treble it with 8 threads (see the aforementioned link for some ConcurrentHashMap performance measurements). A replacement to synchronized with often better concurrency is offered with various explicit lock classes (such as ReentrantLock).

At a lower level, solutions include holding on to locks for less time and (as part of this), reducing the "housekeeping" involved in managing a lock. The Java 5 atomic classes such as AtomicInteger effectively provide a way to access a shared variable with "less housekeeping", thus improving throughput.