Performance of the Java sorting algorithm

Generally, Java's standard sorting method (Arrays.sort() and its derivative Collections.sort()) does what you would expect from a library routine: it is general-purpose, correct, well-tested and provides adequate performance in many cases. The purpose of this page is therefore not to advocate that everyone goes off and writes their own sorting routine. However, sometimes it is useful to know what performance you can expect from the library routine, and what the few specialised cases are where it's worth replacing it.

Java's sorting algorithm

The sorting algorithm used (in Sun's implementation at least) is a version of mergesort1. Mergesort is based on a "divide and conquer" strategy: we recursively divide the list into two halves and sort each half, then merge the two sorted halves back together. As we break the data down into smaller and smaller parts, we eventually reach a critical size where it's not worth dividing any more, and we sort in situ. Java uses an insertion sort for these 'atomic' sorts. Together, this gives Java's sort routine the following characteristics:

  • The sort is stable: i.e. if two elements are equal, their order won't be swapped after sorting (with some other algorithms, the order of equal elements is effectively random).
  • Because of the requirement to merge lists together into a new list, the algorithm operates on a copy of the data: when you call Arrays.sort(), the first thing the method does is take a clone of the array. Taking a clone of the array means taking a copy of the list of references, not of the actual data. The cost is generally negligible for large-ish arrays, but relatively considerable for sorting very small sets of data.
  • The underlying insertion sort has a worst case where the input data is in reverse order; thus in principle, it is possible to hit an unlucky case consisting of a series of sub-lists in reverse order.
  • In terms of recursion, this hybrid algorithm improves on some "textbook" cases of recursing right down until the sublist size is 1 or 2 (but at the expense of a less constant sort time as mentioned in the previous point).
  • In most practical cases, doubling the size of the list more or less doubles2 the time taken to sort.

You might have read the last point and thought "well, duh!". But it turns out that more-or-less doubling the sort time as the data size is doubled is actually good going. Various less efficient sorting algorithms (including the insertion sort on its own) don't manage this, or are more likely to hit "worse cases", where the time effectively quadruples as the list size doubles.

Sort performance in practice

Holding this in our head, on the next page we look at some observed measurements of performance of Java's sorting algorithm.

1. I understand that a different algorithm will be used in Java 7. Even more reason not to rewrite your code on the basis of the particular algorithm used in today's version of the library.
2. As we'll see, mergesort is usually analysed as an O n log n sort. In practical terms, this means that when we double the number of elements, the sort time is actually "slightly worse than double" (multiplying by 2.2 gives a good guide in many cases).

If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.

Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.