Regular Expressions in Java

Regular Expression Basics

Regular expressions (regex) are an effective way of describing common patterns in strings. For example, all phone numbers in North America have 10 digits; this can be easily described by regular expressions:

  • [0-9]{10}, which matches with 10 digits such as 3334445555,
  • [0-9]{3}-[0-9]{3}-[0-9]{4}, which matches with hyphened numbers such as 333-444-5555,
  • [0-9]{3}-?[0-9]{3}-?[0-9]{4}, which matches either case above.

In this tutorial we will learn how to construct and use regular expressions in the java.util.regex API.

Patterns.
A pattern string is a sequence of characters following the syntax of regex. The following are some example patterns.

Construct Description
abc Exactly three characters abc in sequence
^abc abc matches at the beginning
abc$ abc matches at the end
a|c a or c

Note that we use some characters with special meanings, which are called metacharacters. The metacharacters in the java.util.regex API include: <([{\^-=$!|]})?*+.>

Character Classes.
A character class is a set of characters enclosed within square brackets. It matches with any character contained in the set. The following are some examples:

Construct Description
[abc] a, b, or c (simple class)
[^abc] Any character (including non-alphabet characters) except a, b, or c (negation)
[a-zA-Z] a through z, or A through Z, inclusive (range)
[0-9] 0 through 9, inclusive (range)

The following is a list of predefined character classes as convenient shorthands.

Construct Description
. Any character
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]

Quantifiers.
Quantifiers specify the number of occurrences; without a quantifier, a character class matches one occurrence by default. The following table shows several patterns with quantifiers.

Construct Description
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but at most m times

Regular Expressions with the String Class

The String class in Java has several methods supporting regular expressions.

  •  boolean matches(String regex)
    Tells whether or not this string matches the given regular expression.
  •  String replaceAll(String regex, String replacement)
    Replaces each substring of this string that matches the given regular expression with the given replacement.

The following code shows how to use such methods to manipulate strings.

String str = &quot;......&quot;;
boolean isTrue = 
    str.matches(&quot;[tT]rue|[yY]es&quot;); // check for True, true, Yes, yes
boolean isValidPhone = 
    str.matches(&quot;^[0-9]{3}-?[0-9]{3}-?[0-9]{4}$&quot;);

String s = 
    str.replaceAll(&quot;&lt;[^&gt;]*&gt;&quot;, &quot;&quot;); // remove anything enclosed by &quot;&lt;&quot; and &quot;&gt;&quot;

Comments

comments