Search this site

 Home  Regex intro  Character classes  Repetition operators  Find/replace  Multiline  Example regex


More on character classes

We looked at some basic regular expressions that included character classes: a "choice" of character to match placed inside square brackets. For example, [Tt] will match against either T or t. On this page we'll look at some more possibilities with character classes.

Character ranges

A useful feature is that we can put a range of characters by placing a hyphen between start and end character. For example, to match any lower case letter, we can write:

[a-z]

Similarly, to match a digit, we can write:

[0-9]

We can combine single characters and ranges, and/or combine multiple ranges:

ExpressionMeaning
[a-zA-Z]A lower or upper case letter in the range A-Z.
[0-9A-F]A hexadecimal digit (0-9 or A-F)
[0-9A-Fa-f]A hexadecimal digit, either upper or lower case.
[ 0-9]A space or digit.

Negation

To say "not in the range...", we put a hat symbol ^ at the beginning of the character class expression. So for example, to say "not a digit", we would write the following:

[^0-9]

Intersection

An operation called intersection essentially means "in this class AND in this one". It is really useful when we combine an intersection with a negation to say "in this class BUT NOT in this one". The intersection uses two ampersands. Here is the syntax:

[0-9&&[^5]]
[a-z&&[^aeiouy]]

The first of these says a digit except 5; the second says any lower case letter except those representing vowels.

Note that one ampersand on its own– &– simply represents that character.

Named character classes

Some 'shortcuts' exist for common character classes (such as [0-9]) in the form of named character classes.

Next...

On the next page, we'll look at a special character class: the dot.

comments powered by Disqus

Written by Neil Coffey. Copyright © Javamex UK 2012. All rights reserved.