Capturing Groups
In the previous section, we saw how quantifiers attach to one character, character class, or capturing group at a time. But until now, we have not discussed the notion of capturing groups in any detail.
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog)
creates a single group containing the letters "d" "o"
and "g"
. The portion of the input string that matches the capturing group will be saved in memory for later recall via backreferences (as discussed below in the section, Backreferences).
Numbering
As described in the Pattern
API, capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C)))
, for example, there are four such groups:
((A)(B(C)))
(A)
(B(C))
(C)
To find out how many groups are present in the expression, call the groupCount
method on a matcher object. The groupCount
method returns an int
showing the number of capturing groups present in the matcher's pattern. In this example, groupCount
would return the number 4
, showing that the pattern contains 4 capturing groups.
There is also a special group, group 0, which always represents the entire expression. This group is not included in the total reported by groupCount
. Groups beginning with (?
are pure, non-capturing groups that do not capture text and do not count towards the group total. (You'll see examples of non-capturing groups later in the section Methods of the Pattern Class.)
It's important to understand how groups are numbered because some Matcher
methods accept an int
specifying a particular group number as a parameter:
-
public int start(int group)
: Returns the start index of the subsequence captured by the given group during the previous match operation. - "); //]]>
No comments:
Post a Comment