[an error occurred while processing this directive]

How to choose which Java collection class to use?

The Java Collections API provides a whole host of data structures, especially since the API was expanded in Java 5 (and again slightly in Java 6) to include concurrent collections. At first, the array of choices can be a little daunting: should I use a HashMap or a LinkedHashMap? When should I use a list or a HashSet? When should I use a TreeMap rather than a HashMap? But with a bit of guidance, the choice needn't be quite so daunting. There are also a few cases where it's difficult to decide because the choice is very arguable. And in other cases, having a clear set of rules of thumb can guide you to an appropriate decision.

Basic approach to choosing a collection

The overall approach I'd suggest for choosing is as follows:

  1. choose the general type of organisation that your data needs to have (e.g. map or list); without too much thought, this is usually fairly clear;
  2. then, choose the implementation of that type that has the minimum functionality that you actually require (e.g. don't choose a sorted structure if you don't actually need the data to be sorted).

In general, the algorithm that underlies each collection class is designed to be a good tradeoff between efficiency and certain minimal requirements. So as long as you understand the minimal requirements that a given class is designed to provide, you shouldn't need to get too bogged down in the actual algorithms (though if you are interested in algorithms, the source code to all the collections classes is available and they make fascinating case studies, of course).

1. Basic collection types

The first part of the decision is choosing what "basic category" of organisation or functionality your data needs to have. The broad types are as follows:

Collection typeFunctionalityTypical uses
  • Essentially a variable-size array;
  • You can usually add/remove items at any arbitrary position;
  • The order of the items is well defined (i.e. you can say what position a given item goes in in the list).
Most cases where you just need to store or iterate through a "bunch of things" and later iterate through them.
  • Things can be "there or not"— when you add items to a set, there's no notion of how many times the item was added, and usually no notion of ordering.
  • Remembering "which items you've already processed", e.g. when doing a web crawl;
  • Making other yes-no decisions about an item, e.g. "is the item a word of English", "is the item in the database?" , "is the item in this category?" etc.
  • Stores an association or mapping between "keys" and "values"
Used in cases where you need to say "for a given X, what is the Y"? It is often useful for implementing in-memory caches or indexes. For example:
  • For a given user ID, what is their cached name/User object?
  • For a given IP address, what is the cached country code?
  • For a given string, how many instances have I seen?
  • Like a list, but where you only ever access the ends of the list (typically, you add to one end and remove from the other).
  • Often used in managing tasks performed by different threads in an application (e.g. one thread receives incomming connections and puts them on a queue; other "worker" threads take connections off the queue for processing);
  • For traversing hierarchical structures such as a filing system, or in general where you need to remember "what data to process next", whilst also adding to that list of data;
  • Related to the previous point, queues crop up in various algorithms, e.g. build the encoding tree for Huffman compression.

The first three of these are really "bread and butter" collection types. On the other hand, it's probably fair to say that queues will be used less often by many programmers. Even if you are writing a multithreaded server or application involving multi-threaded job processing, you can often use the Java 5 Executor framework (which uses queues underlyingly) without needing to manipulate queues directly.

To add to the confusion, there's actually a slight overlap between lists and queues (what we often refer to as a linked list is in effect a type of queue). But for now, we'll go with the categories above and come back to the issue on the next page.

Once you've decided on the basic collection type that is most appropriate for your use, you need to select an appropriate class that implements a collection of that type. On the next page, we look specifically at how to choose which collection class to use of a given type.