Repetition operators in regular expressions
On the previous page, we mentioned the dot and
asterisk combination, meaning "any character repeated any times". The asterisk
is what is sometimes called a repetition operator.
How repetition operators work
Repetition operators are placed after the item to be repeated. We have already seen that .* means any character repeated any number of times.
By item, we mean a single character or a character class. So for example,
[0-9]* means any number of digits and a* means
any number of instances of the letter a.
Available repetition operators
The following operators all behave in a similar way to the asterisk: they are
placed after the repeated item.
Operator | Meaning | Examples |
* | Zero or more... | .* any character sequence [0-9]* any number of digits a* any number of letter as |
? | Zero or one... (i.e. optional element) | [0-9]? An optional digit X? An optional letter X |
+ | One or more... | [0-9]+ One or more digits |
{x} | x instance of... | [0-9]{10} Ten digits n{4} Four instances of the letter n .{3} Three instances of any character |
{x,y} | between x and y instances of... | [0-9]{10,14} Between 10 and 14 digits |
{x,} | at least x instances of... | .{5,} At least 5 characters |
Now you can understand why the following expression:
matches a string containing ten digits.
Resolving ambiguity: greedy and reluctant operators
In an expression such as that above, there are potential ambiguities. For
example, given a string with more than ten characters, how do we decide which
.* matches against the extra digits? And what stops the first .*
from simply matching against the entire string, digits or not? On the next page
we answer these questions by looking at the notion of greedy and reluctant operators.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.