The Java Collator class

When applied to localisation, the term collation generally refers to the conventions for ordering strings in a particular language and, by extension, for when to consider to strings to be equal.

When working purely with English texts, it is common not to think very much about collation at all. Most programmers take the order of strings to be that produced by Collections.sort() or determined by the ASCII values of its characters; two strings are considered equal if String.equals() deems them to be so, or if the underlying bytes values of the strings are identical. But there are cases when this clearly isn't adequate, even in English. Consider the following example, where we use the simple Collections.sort() method to sort three of my favourite words:

List list = new ArrayList();

When we run this, we get the following output:

[cafeteria, caffeine, café]

Depending on the sorting convention we want to use, there are at least two orderings of these words that would be cosidered an acceptable order, for example, in dictionaries. But sadly the above, with the letter e (albeit with an accent) following the letter f, is not generally one of them...

Correcting the sort order: introducing the Collator

If we're prepared to accept some default behaviour, then fixing the sort order is actually very simple. We obtain an instance of a Collator object and then pass this to the Collections.sort() method:

Collator coll = Collator.getInstance(); 
Collections.sort(list, coll);

With this modification, the sort() method puts the word cafén in a more conventionally acceptable place, before the word caffeine (and in fact before the word cafeteria, treating it as though it were spelt without the accent).

Configuring the collator

So what is a Collator? A collator is essentially an object that knows how to sort and compare strings. The default is appropriate for many applications. However, the collator's behaviour can be customised slightly:

If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.

Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.