Search and replace with regular expressions
It is possible to perform search and replace operations on strings in Java using
regular expressions. The Java String and Matcher classes offer relatively simple methods
for matching and search/replacing strings which can bring the benefit of
string matching optimisations that could be
cumbersome to implement from scratch.
The complexity of using these methods depends how much flexibility you need:
Replacing one "fixed" substring with another
This is the "simplest" form of search and replace. We want to find exact instances of a specific subtring and
replace them with another given substring. To do so, we can call replaceAll() on the String, but we
need to put Pattern.quote() around the substring we are searching for. For example,
this will replace all instances of the substring "1+" with "one plus":
str = str.replaceAll(Pattern.quote("1+"), "one plus");
If you are familiar with regular expressions, then you will know that a plus sign normally has a
special meaning. But provided you remember to put Pattern.quote() around the first string, we can
use replaceAll() as a simple search and replace call. (If the replacement substring contains
a dollar sign or backslash, then we also need to use Matcher.quoteReplacement(): see below.)
Replacing substrings with a fixed string
If you simply want to replace all instances of a given expression within a Java string
with another fixed string, then things are fairly straightforward. For example,
the following replaces all instances of digits with a letter X:
str = str.replaceAll("[0-9]", "X");
The following replaces all instances of multiple spaces with a single space:
str = str.replaceAll(" {2,}", " ");
We'll see in the next section that we should be careful about passing
"raw" strings as the second paramter, since certain characters in this string
actually have special meanings.
Replacing with a sub-part of the matched portion
In the replacement string, we can refer to captured groups
from the regular expression.
For example, the following expression removes
instances of the HTML 'bold' tag from a string, but leaves the text inside
the tag intact:
str = str.replaceAll("<b>([^<]*)</b>", "$1");
In the expression <b>([^<]*)</b>, we capture
the text between the open and close tags as group 1. Then, in the replacement string,
we can refer to the text of group 1 with the expression $1. (The second group
would be $2 etc.)
Including a dollar sign or backslashes in the replacement string
To actually include a dollar sign or backslash in the replacement string, we need to put another backslash
before the dollar symbol or backslash to "escape" it... remembering that within a string literal, a single backslash also needs to
be doubled up! For example:
str = str.replaceAll("USD", "\\$");
The static method Matcher.quoteReplacement() will replace instances
of dollar signs and backslashes in a given string with the correct form to allow them
to be used as literal replacements:
str = str.replaceAll("USD",
Matcher.quoteReplacement("$"));
In general:
- If there is a chance that the replacement string will include a dollar sign
or a backslash character, then you should wrap it in Matcher.quoteReplacement().
Further information: more flexible find and replacement operations
For additional flexibility:
- the replaceAll() method can be used with a lambda expression:
see the accompanying page and example of using replaceAll() with a lambda expression;
- Matcher.find() method can be used to provide further control over the
operation.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.