The Java Stream API

Java 8 introduces the notion of streams and the Stream API. A Stream defines the logic for iterating through a sequence of data such as a collection (e.g. a List, Set etc) or other data source. A Stream is effectively an "iterator" that can include logic to define how to iterate through the data items in question. For example, it can include transformations such as filtering and sorting. These transformations are often defined with the help of lambda expressions.

In order to understand what a Stream is, it is first worth considering a "normal" Iterator for comparison. Prior to Java 8, we could have obtained an Iterator over a collection of objects as follows:

	
List<String> list = ...some list...
Iterator<String> it = list.iterator();

At this point, no objects have actually been retrieved from the list. But the caller can now retrieve sequential objects by calling it.next():

	
while (it.hasNext()) {
  String str = it.next();
  System.out.println(str);
}

As you are probably aware, this is essentially the code that is generated under the hood when we write— more idiomatically— the following:

	
for (String str : list) {
  System.out.println(str);
}

Part of the definition of the Iterator is that it will determine, on each successive call to it.next(), which actually is the "next" object to retrieve. In general, iteration is a once-only operation: once we have called it.next(), there is no way to "go back" and retrieve a previous item again (though this is possible with a ListIterator). The Iterator remains tightly bound to the specific collection that it came from, and hence can also be destructive: in principle, calling it.remove() will remove the last element fetched.

Defining a Stream

With the above in mind, let us take a look at the equivalent Stream:

Stream<String> stream = list.stream();

The Stream shares some properties with the Iterator. Importantly, the Stream simply represents an "intention" to retrieve items and will define the order in which they are fetched, but no items have actually been retrieved from the list at this point. One way of seeing things if you are used to working with databases is that a Stream defines a query, but doesn't execute that query we call a specific method to do so.

Keping with our query analogy, what makes Streams powerful is that we can therefore extend our declaration as follows:

Stream<String> stream = list.stream.filter(s -> s.length() == 2);

We have now declared a Stream that, when invoked, will iterate over all of the two-character strings in the list. And this filter criterion is built into the actual definition of the Stream. No strings have actually been retrieved or had length() called on them yet, but this will happen once we "execute the query": in other words, when we call a specific method that actually iterates through the items in the stream. (These methods are called terminating operations and we will look at them in a moment.)

We can extend the stream definition further with multiple criteria and transformations. For example, the following defines a stream of the "top 10 two-character strings sorted alphabetically":

Stream<String> stream = list.stream.filter(s -> s.length() == 2).sorted().limit(10);

Streams, then, allow us to build up potentially complex iteration operations or "queries" against a collection of data in Java, taking advantage of lambda expressions to define just the actual logic we require and minimising the amount of "boilerplate" code that we need to write. We will see further stream methods later. But first, let us look at how we actually complete the process and retrieve the desired items from the stream.

Stream termination: collection, reduction and consumption

A Stream defines a "query" or "intention" to retrieve values from a particular collection or data source. As we have seen, a Stream can be defined by chaining together a number of so-called intermediate operations, such as sorting and filtering. But as we have stated, defining the stream does not actually iterate through it. To actually iterate through the Stream and obtain a result, we apply a terminal operation.

A terminal operation is one that actually invokes or iterates through a Stream to produce a result. It is the final stage in the "pipeline". For example:

Stream<String> stream = list.stream();
int noStrings = stream.count();

Calling the count() method on the stream is what actually initiates the "pipeline" of calls to retrieve successive items, filter them on the specified criteria and count the number of matches.

Terminal operations such as count() fall into three broad categories. The table below summarises these categories with examples:

Terminator categoryDescriptionExamples
ReductionValues generated by the stream are merged into a single aggregate value (e.g. a count or maximum). stream.count() stream.min() stream.findFirst()
CollectionValues are combined/appended to a collecton or other mutable object (e.g. a string or list with all values appended).
stream.collect(Collectors.toList())
stream.collect(Collectors.joining(","))
ConsumptionAll values generated are passed to a Consumer which processes them individually.
stream.forEach(System.out::println)

The process of collection is also sometimes termed mutable reduction, and in reality there is some overlap between the processes of reduction and collection. In the strictest definition of reduction, the reduction function replaces the current aggregate value with a new immutable aggregate value for each item in the stream as they are processed; in the case of collection, the same, mutable aggregate object (such as a List) may have items in the stream appended to it as they are collected or aggregated.

As demonstrated in the above examples, values in the stream are generally collected via a Collector, and in practice the collector will usually be created via one of the static methods provided by the Collectors class.

Writing the entire stream pipeline in a single statement

In simple cases, it is idiomatic to chain the stream construction, filtering and termination together so that the entire "pipeline" is written as a single Java statement. For ease of reading, many programmers like to place line breaks between individual calls in the chain. For example:

List<String> shortStrings = strings.stream()
    .filter(s -> s.length() < 5)
	.collect(Collectors.toList());

How to obtain a Stream: using lambdas with other data sources

The most common way to obtain a Stream into some data is by calling stream() on a collection such as a List. But other convenience methods have been added to the Java library to obtain Streams from other sources, allowing you to use lambdas with them. For example, we can stream an array with Arrays.stream(). On the next page, we look at various sources of Streams in Java, showing how to obtain a Stream from a variety of places where we would previously have used a collection, array, iterator etc.

Doing more with streams

On the next page, we consider the principal stream operations used to define, sorting, filtering etc.