Repetition operators in regular expressions

On the previous page, we mentioned the dot and asterisk combination, meaning "any character repeated any times". The asterisk is what is sometimes called a repetition operator.

How repetition operators work

Repetition operators are placed after the item to be repeated. We have already seen that .* means any character repeated any number of times. By item, we mean a single character or a character class. So for example, [0-9]* means any number of digits and a* means any number of instances of the letter a.

Available repetition operators

The following operators all behave in a similar way to the asterisk: they are placed after the repeated item.

*Zero or more....*
any character sequence
any number of digits
any number of letter as
?Zero or one...
(i.e. optional element)
An optional digit
An optional letter X
+One or more...[0-9]+
One or more digits
{x}x instance of...[0-9]{10}
Ten digits
Four instances of the letter n
Three instances of any character
{x,y}between x and y instances of...[0-9]{10,14}
Between 10 and 14 digits
{x,}at least x instances of....{5,}
At least 5 characters

Now you can understand why the following expression:


matches a string containing ten digits.

Resolving ambiguity: greedy and reluctant operators

In an expression such as that above, there are potential ambiguities. For example, given a string with more than ten characters, how do we decide which .* matches against the extra digits? And what stops the first .* from simply matching against the entire string, digits or not? On the next page we answer these questions by looking at the notion of greedy and reluctant operators.