Performance of the Java sorting algorithm (ctd)

On the previous page, we consider some features of the Java sort algorithm which may affect its performance and suitability with a given set of data. Now without further ado, let's dive in and look at some actual performance measurements of Collections.sort(). Figure 1 shows the mean sort time for an n-element list of random integers on my test system. (Each measurement is actually the observed time in nanoseconds to sort 50,000 lists of the given size pre-filled with random integers, but it's the relative performance that really interests us.)

For comparison, I also measured the time to sort using a simple insertion sort. In each run, two copies of the 50,000 test lists were thus made. Then one copy was sorted with Collections.sort() and the other with our insertion sort method.

The graph shows that compared to the insertion sort, Java's mergesort scales better overall. But it also involves some more complex operations:

Although the insertion sort scales poorly overall, for small amounts of data (up to about 30-35 items on our test system), it actually performs better. This is because the insertion sort (a) sorts everything in situ (doesn't need to copy objects into an array or take a clone); (b) runs in a loop comprising simple operations rather than making recusrive calls. After a certin list size, we hit a point where the "extra hoops" needed for performing the mergesort pay off. At some point, the algorithm that involves individually more complex operations but scales better wins out over the simpler but less scalable one.

So what does this mean in practice?

In a few, specific cases, Java's default sort algorithm may not be the most suitable. If you are sorting very small lists a very large number of times, then implementing a non-recursing, in situ sort such as an insertion sort may give better performance1

Sorting in parallel

One potential way of improving sort performance on large-ish data sets is to sort in parallel on multiprocessor machines. In the Java threading section of this web site, we discuss using CyclicBarrier to coordinate a parallel sort, which the data to be sorted is divided amongst various threads.

1. As a real-life example, I actually first hit upon this problem while writing the AI engine for a board game. Part of the algorithm involved repeatedly sorting small lists of candidate moves in order to pick the best ones. On profiling the code, I found that the majority of time was being spent inside the call to clone the array prior to sorting!