Search this site

 Home  Regex intro  Character classes  Repetition operators  Find/replace  Multiline  Example regex


Character classes revisited: named classes

Recall that character classes are the 'choices' to match against a single character that we put in square brackets. For example, to match any digit, we have been using [0-9]. We've always "spelled out" the characters or range of characters in this way.

For certain common character choices (and some uncommon ones) there's actually an easier option. We can put a backslash followed by a character class name. For example, to match a single digit, we can write the expression \d. So we can now write our 'has ten characters' method as follows:

public boolean containsTenDigits(String str) {
  return str.matches(".*\\d{10}.*");
}

Notice that when we want to put a backslash inside a regular expression, we have to put a double backslash. This is because the slash already has a meaning inside Java strings (allowing us to write so-called escape sequences such as \n for a newline).

Matching whitespace

Another useful character class is \s. This matches so-called whitespace: spaces, tabs and line breaks (strictly speaking either ASCII character 10– the newline character– and character 13– the carriage return). ASCII characters 11 and 12 also count as whitespace, but in practise these are extremely rare nowadays.

Again, to write \s inside a string literal, we need to double the backslash: "\\s".

Named groups

Various character classes can be formed from named groups, which are formed with the expression \p{name}, where name is one of a number of possible group names. Here are some of the most useful groups:

Group nameCharacters
ASCIIAny 7-bit ASCII character (i.e. characters in the range 0-127 inclusive).
PunctAny punctuation character from the 7-bit ASCII range.
CntrlAny ASCII control character in the range 0-127. This effectively means characters 0-31 and 127.
PrintAny printable ASCII character (those in the range 32-126).
LAny Unicode letter (including those outside the ASCII range).
Lu / LlUpper and lower case Unicode letter respectively.
PUnicode punctuation.
comments powered by Disqus

Written by Neil Coffey. Copyright © Javamex UK 2012. All rights reserved.