Character classes revisited: named classes
Recall that character classes are the 'choices' to match against
a single character that we put in square brackets. For example, to match
any digit, we have been using [0-9]. We've always "spelled out"
the characters or range of characters in this way.
For certain common character choices (and some uncommon ones) there's
actually an easier option. We can put a backslash followed by a
character class name. For example, to match a single
digit, we can write the expression \d. So we can now write our
'has ten characters' method as follows:
public boolean containsTenDigits(String str) {
return str.matches(".*\\d{10}.*");
}
Notice that when we want to put a backslash inside a regular expression,
we have to put a double backslash. This is because the slash
already has a meaning inside Java strings (allowing us to write so-called
escape sequences such as \n for a newline).
Matching whitespace
Another useful character class is \s. This matches so-called
whitespace: spaces, tabs and line breaks (strictly speaking
either ASCII character 10– the newline character–
and character 13– the carriage return). ASCII characters
11 and 12 also count as whitespace, but in practise these are extremely rare
nowadays.
Again, to write \s inside a string literal, we need to double
the backslash: "\\s".
Named groups
Various character classes can be formed from named groups,
which are formed with the expression \p{name}, where name
is one of a number of possible group names. Here are some of the most
useful groups:
Group name | Characters |
ASCII | Any 7-bit ASCII character (i.e. characters in the range 0-127 inclusive). |
Punct | Any punctuation character from the 7-bit ASCII range. |
Cntrl | Any ASCII control character in the range 0-127. This effectively means characters 0-31 and 127. |
Print | Any printable ASCII character (those in the range 32-126). |
L | Any Unicode letter (including those outside the ASCII range). |
Lu / Ll | Upper and lower case Unicode letter respectively. |
P | Unicode punctuation. |
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.