CyclicBarrier example: a parallel sort algorithm

As an example of using the Java CyclicBarrier class, we consider a parallel sort. There are various ways to sort data in parallel, but the approach we consider here lends itself to using a CyclicBarrier. It also has the advantage that we're essentially building on available library routines: we divide the work into a sublist per thread, and then each thread uses the standard Collections.sort() to sort its sublist.

Parallel sort algorithm with CyclicBarrier

Essentially, we: (a) split the data into one "bucket" per thread, where each bucket contains an approximately equal number of values, and each bucket covers values within a particular range; (b) each thread sorts its bucket; (c) the buckets are then put together in order: since each bucket covered values within a particular range, simply appending the buckets in thread order results in a completely sorted list.

The actual sorting of the buckets is obviously parallelisable. We can also parallelise the process of putting the data into buckets: for this, each thread goes through an equal-sized portion of the list and, from that sublist, decides which bucket to put each piece of data into.

Split points

The slightly more subtle problem— and subtly parallelisable problem— is deciding on values for the "split points". For example, if we want to sort the data using two threads, we need to find some value of x so that putting values less than x into bucket 1 and values more than or equal to x into bucket 2 will result in roughly equal-sized buckets. In this case, x is the "split point". Wtih 3 threads, we need 2 split points; 4 threads needs 3 split points etc.

If we know in advance that our data is evenly distributed across the possible range, then we can decide on split points with a simple division. For example, with 2 threads, if each piece of data can take a value between 0 and 100,000 and all values have equal probability, then we can set x to be 50,000, and we know that the buckets are likely to be roughly equal in size. But what if the distirbution is more subtle: say, values below 20,000 are twice as likely, or values are more probable the closer they are to some number?

One method that would work, but which would be utterly pointless, would be to sort the data (in a single thread), and then take x to be the value halfway through the list: that way, we obviously would know that half of the values were smaller than x. And scaling to larger numbers of threads, taking split points at equal intervals in the sorted list, we'd know for sure that an equal number of numbers in the list had values between the given split points.

Now obviously, there's little point on deciding on how to sort a list in parralel by first sorting it in a single thread! But what we can do is sort a small random sample of the list and take the split points from that. And although we sort the sample list in a single thread1, we can pick the random sample in parallel: each thread chooses, say, 16 random values from a portion of the list.

Next

On the next page, we summarise the stages we've just described, and then look at how to code them using a Java CyclicBarrier.


1. Actually, using a construct called a sorting network, we can sort a smallish list in parallel, and this is the approach suggested by Herlihy & Shavit (2008), The Art of Multiprocessor Programming, pp. 290-291. Largely for simplicity of illustration (but also because I'm skeptical that it's worth the effort in this case), I choose to sort the sample list in a single thread.