Regular Expressions in Java
Regular Expression Basics
Regular expressions (regex) are an effective way of describing common patterns in strings. For example, all phone numbers in North America have 10 digits; this can be easily described by regular expressions:
[0-9]{10}
, which matches with 10 digits such as 3334445555,[0-9]{3}-[0-9]{3}-[0-9]{4}
, which matches with hyphened numbers such as 333-444-5555,[0-9]{3}-?[0-9]{3}-?[0-9]{4}
, which matches either case above.
In this tutorial we will learn how to construct and use regular expressions in the java.util.regex
API.
Patterns.
A pattern string is a sequence of characters following the syntax of regex. The following are some example patterns.
Construct | Description |
---|---|
abc |
Exactly three characters abc in sequence |
^abc |
abc matches at the beginning |
abc$ |
abc matches at the end |
a|c |
a or c |
Note that we use some characters with special meanings, which are called metacharacters. The metacharacters in the java.util.regex
API include: <([{\^-=$!|]})?*+.>
Character Classes.
A character class is a set of characters enclosed within square brackets. It matches with any character contained in the set. The following are some examples:
Construct | Description |
---|---|
[abc] |
a, b, or c (simple class) |
[^abc] |
Any character (including non-alphabet characters) except a, b, or c (negation) |
[a-zA-Z] |
a through z, or A through Z, inclusive (range) |
[0-9] |
0 through 9, inclusive (range) |
The following is a list of predefined character classes as convenient shorthands.
Construct | Description |
---|---|
. |
Any character |
\d |
A digit: [0-9] |
\D |
A non-digit: [^0-9] |
\s |
A whitespace character: [ \t\n\x0B\f\r] |
\S |
A non-whitespace character: [^\s] |
Quantifiers.
Quantifiers specify the number of occurrences; without a quantifier, a character class matches one occurrence by default. The following table shows several patterns with quantifiers.
Construct | Description |
---|---|
X? |
X, once or not at all |
X* |
X, zero or more times |
X+ |
X, one or more times |
X{n} |
X, exactly n times |
X{n,} |
X, at least n times |
X{n,m} |
X, at least n but at most m times |
Regular Expressions with the String
Class
The String
class in Java has several methods supporting regular expressions.
boolean matches(String regex)
Tells whether or not this string matches the given regular expression.String replaceAll(String regex, String replacement)
Replaces each substring of this string that matches the given regular expression with the given replacement.
The following code shows how to use such methods to manipulate strings.
String str = "......"; boolean isTrue = str.matches("[tT]rue|[yY]es"); // check for True, true, Yes, yes boolean isValidPhone = str.matches("^[0-9]{3}-?[0-9]{3}-?[0-9]{4}$"); String s = str.replaceAll("<[^>]*>", ""); // remove anything enclosed by "<" and ">"