Tutorial #13 : Regular Expression


  • Pattern Class − A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile() methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.
  • Matcher Class − A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher() method on a Pattern object.
  • PatternSyntaxException − A PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.

SubexpressionMatches
^Matches the beginning of the line.
$Matches the end of the line.
.Matches any single character except newline. Using m option allows it to match the newline as well.
[...]Matches any single character in brackets.
[^...]Matches any single character not in brackets.
\ABeginning of the entire string.
\zEnd of the entire string.
\ZEnd of the entire string except allowable final line terminator.
re*Matches 0 or more occurrences of the preceding expression.
re+Matches 1 or more of the previous thing.
re?Matches 0 or 1 occurrence of the preceding expression.
re{ n}Matches exactly n number of occurrences of the preceding expression.
re{ n,}Matches n or more occurrences of the preceding expression.
re{ n, m}Matches at least n and at most m occurrences of the preceding expression.
a| bMatches either a or b.
(re)Groups regular expressions and remembers the matched text.
(?: re)Groups regular expressions without remembering the matched text.
(?> re)Matches the independent pattern without backtracking.
\wMatches the word characters.
\WMatches the nonword characters.
\sMatches the whitespace. Equivalent to [\t\n\r\f].
\SMatches the nonwhitespace.
\dMatches the digits. Equivalent to [0-9].
\DMatches the nondigits.
\AMatches the beginning of the string.
\ZMatches the end of the string. If a newline exists, it matches just before newline.
\zMatches the end of the string.
\GMatches the point where the last match finished.
\nBack-reference to capture group number "n".
\bMatches the word boundaries when outside the brackets. Matches the backspace (0x08) when inside the brackets.
\BMatches the nonword boundaries.
\n, \t, etc.Matches newlines, carriage returns, tabs, etc.
\QEscape (quote) all characters up to \E.
\EEnds quoting begun with \Q.

Methods of the Matcher Class

Here is a list of useful instance methods −

Index Methods

Index methods provide useful index values that show precisely where the match was found in the input string −
Sr.No.Method & Description
1
public int start()
Returns the start index of the previous match.
2
public int start(int group)
Returns the start index of the sub-sequence captured by the given group during the previous match operation.
3
public int end()
Returns the offset after the last character matched.
4
public int end(int group)
Returns the offset after the last character of the sub-sequence captured by the given group during the previous match operation.

Comments