Introduction to regular expressions in Java

Regular expressions are a special "language" for denoting patterns that we want to match to strings. When used appropriately, they are a very powerful feature of Java and other programming languages. Using the Java regular expression API, we can perform tasks such as the following in just one or two lines of code:

As well as avoiding sprawling lines of code, using the regular expression API to perform these tasks can also bring us efficiency optimisations that we might not have considered were we to write the equivalent code from scratch. As we shall see, this succinctness does sometimes come at the price of readability. Many programmers therefore have a love-hate relationship with regular expressions!

To illustrate the advantages and disadvantages of using regular expressions, let's start with a simple example.

Example pattern matching: hand-coded vs regular expressions

Suppose you want to answer the question: does a given string contain a series of 10 digits? You could hand-code this: cycle through the characters in the string until you hit a digit. Then when you find a digit, cycle through checking that the next nine characters are digits. So in Java, the code would look something like this:


public boolean hasTenDigits(String s) {
  int noDigitsInARow = 0;
  for (int len = s.length(), i = 0; i < len; i++) {
    char c = s.charAt(i);
    if (Character.isDigit(c)) {
      if (++noDigitsInARow == 10) {
        return true;
      }
    } else {
      noDigitsInARow = 0;
    }
  }
  return false;
}

The strengths and weaknesses of this code are obvious:

We can perform the same task using a regular expression. The result would look something like this:


public boolean hasTenDigits(String s) {
  return s.matches(".*[0-9]{10}.*");
}

You'll probably agree that we've essentially reversed the strengths and weaknesses (verbosity vs understandability) in the above code. Now, we have a nice succinct piece of code, but it relies on the programmer understanding what is a slightly obscure piece of syntax to the uninitiated. But despite the initial obscurity, the advantages of regular expressions include:

On the next page, we'll get going with basic expressions with String.matches().

In case you already know something about regular expressions and want to skip ahead, here are some of the later topics currently covered by this tutorial:

Regular expression examples

Finally, we'll look at a couple of examples of using regular expressions: