Class RE2
This class also contains various implementation helpers for RE2 regular expressions.
Use the quoteMeta(String) utility function to quote all regular expression
metacharacters in an arbitrary string.
See the Matcher and Pattern classes for the public API, and the package-level documentation for an overview of how to use this API.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static interface(package private) static interface -
Field Summary
FieldsModifier and TypeFieldDescription(package private) static final int(package private) static final int(package private) static final int(package private) final int(package private) static final int(package private) final String/ RE2 instance members.(package private) static final int/ Parser flags.(package private) static final int(package private) boolean(package private) static final int(package private) static final int(package private) final int(package private) static final int(package private) static final int(package private) static final intprivate final AtomicReference<Machine> (package private) static final int(package private) String(package private) boolean(package private) int(package private) byte[](package private) final Prog(package private) static final int/ Anchors(package private) static final int(package private) static final int -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate voidallMatches(MachineInput input, int n, RE2.DeliverFunc deliver) (package private) static RE2Parses a regular expression and returns, if successful, anRE2instance that can be used to match against text.(package private) static RE2compileImpl(String expr, int mode, boolean longest) (package private) static RE2compilePOSIX(String expr) compilePOSIXis likecompile(String)but restricts the regular expression to POSIX ERE (egrep) syntax and changes the match semantics to leftmost-longest.private int[]doExecute(MachineInput in, int pos, int anchor, int ncap) (package private) StringReturns a string holding the text of the leftmost match insof this regular expression.findAllis the All version offind(String); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.(package private) List<int[]> findAllIndex(String s, int n) findAllIndexis the All version offindIndex(String); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.findAllSubmatch(String s, int n) findAllSubmatchis the All version offindSubmatch(String); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.(package private) List<int[]> findAllSubmatchIndex(String s, int n) findAllSubmatchIndexis the All version offindSubmatchIndex(String); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.(package private) List<byte[]> findAllUTF8(byte[] b, int n) findAllUTF8()is the All version offindUTF8(byte[]); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.(package private) List<int[]> findAllUTF8Index(byte[] b, int n) findAllUTF8Indexis the All version offindUTF8Index(byte[]); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.(package private) List<byte[][]> findAllUTF8Submatch(byte[] b, int n) findAllUTF8Submatchis the All version offindUTF8Submatch(byte[]); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.(package private) List<int[]> findAllUTF8SubmatchIndex(byte[] b, int n) findAllUTF8SubmatchIndexis the All version offindUTF8SubmatchIndex(byte[]); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.(package private) int[]Returns a two-element array of integers defining the location of the leftmost match insof this regular expression.(package private) String[]Returns an array of strings holding the text of the leftmost match of the regular expression insand the matches, if any, of its subexpressions, as defined by the Submatch description above.(package private) int[]Returns an array holding the index pairs identifying the leftmost match of this regular expression insand the matches, if any, of its subexpressions, as defined by the Submatch description above.(package private) byte[]findUTF8(byte[] b) Returns an array holding the text of the leftmost match inbof this regular expression.(package private) int[]findUTF8Index(byte[] b) Returns a two-element array of integers defining the location of the leftmost match inbof this regular expression.(package private) byte[][]findUTF8Submatch(byte[] b) Returns an array of arrays the text of the leftmost match of the regular expression inband the matches, if any, of its subexpressions, as defined by the Submatch description above.(package private) int[]findUTF8SubmatchIndex(byte[] b) (package private) Machineget()(package private) booleanmatch(MatcherInput input, int start, int end, int anchor, int[] group, int ngroup) Matches the regular expression against input starting at position start and ending at position end, with the given anchoring.(package private) booleanReturns true iff this regexp matches the strings.(package private) booleanmatch(CharSequence input, int start, int end, int anchor, int[] group, int ngroup) (package private) static booleanmatch(String pattern, CharSequence s) Returns true iff textual regular expressionpatternmatches strings.(package private) booleanmatchUTF8(byte[] b) Returns true iff this regexp matches the UTF-8 byte arrayb.(package private) intReturns the number of parenthesized subexpressions in this regular expression.private int[]pad(int[] a) (package private) void(package private) static StringReturns a string that quotes all regular expression metacharacters inside the argument text; the returned string is a regular expression matching the literal text.(package private) StringreplaceAll(String src, String repl) Returns a copy ofsrcin which all matches for this regexp have been replaced byrepl.(package private) StringreplaceAllFunc(String src, RE2.ReplaceFunc repl, int maxReplaces) Returns a copy ofsrcin which at mostmaxReplacesmatches for this regexp have been replaced by the return value of of functionrepl(whose first argument is the matched string).(package private) StringreplaceFirst(String src, String repl) Returns a copy ofsrcin which only the first match for this regexp has been replaced byrepl.(package private) voidreset()toString()
-
Field Details
-
FOLD_CASE
static final int FOLD_CASE/ Parser flags.- See Also:
-
LITERAL
static final int LITERAL- See Also:
-
CLASS_NL
static final int CLASS_NL- See Also:
-
DOT_NL
static final int DOT_NL- See Also:
-
ONE_LINE
static final int ONE_LINE- See Also:
-
NON_GREEDY
static final int NON_GREEDY- See Also:
-
PERL_X
static final int PERL_X- See Also:
-
UNICODE_GROUPS
static final int UNICODE_GROUPS- See Also:
-
WAS_DOLLAR
static final int WAS_DOLLAR- See Also:
-
MATCH_NL
static final int MATCH_NL- See Also:
-
PERL
static final int PERL- See Also:
-
POSIX
static final int POSIX- See Also:
-
UNANCHORED
static final int UNANCHORED/ Anchors- See Also:
-
ANCHOR_START
static final int ANCHOR_START- See Also:
-
ANCHOR_BOTH
static final int ANCHOR_BOTH- See Also:
-
expr
/ RE2 instance members. -
prog
-
cond
final int cond -
numSubexp
final int numSubexp -
longest
boolean longest -
prefix
String prefix -
prefixUTF8
byte[] prefixUTF8 -
prefixComplete
boolean prefixComplete -
prefixRune
int prefixRune -
pooled
-
namedGroups
-
-
Constructor Details
-
RE2
RE2(String expr) -
RE2
-
-
Method Details
-
compile
Parses a regular expression and returns, if successful, anRE2instance that can be used to match against text.When matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses the one that a backtracking search would have found first. This so-called leftmost-first matching is the same semantics that Perl, Python, and other implementations use, although this package implements it without the expense of backtracking. For POSIX leftmost-longest matching, see
compilePOSIX(String).- Throws:
PatternSyntaxException
-
compilePOSIX
compilePOSIXis likecompile(String)but restricts the regular expression to POSIX ERE (egrep) syntax and changes the match semantics to leftmost-longest.That is, when matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses a match that is as long as possible. This so-called leftmost-longest matching is the same semantics that early regular expression implementations used and that POSIX specifies.
However, there can be multiple leftmost-longest matches, with different submatch choices, and here this package diverges from POSIX. Among the possible leftmost-longest matches, this package chooses the one that a backtracking search would have found first, while POSIX specifies that the match be chosen to maximize the length of the first subexpression, then the second, and so on from left to right. The POSIX rule is computationally prohibitive and not even well-defined. See http://swtch.com/~rsc/regexp/regexp2.html#posix
- Throws:
PatternSyntaxException
-
compileImpl
- Throws:
PatternSyntaxException
-
numberOfCapturingGroups
int numberOfCapturingGroups()Returns the number of parenthesized subexpressions in this regular expression. -
get
Machine get() -
reset
void reset() -
put
-
toString
-
doExecute
-
match
Returns true iff this regexp matches the strings. -
match
-
match
Matches the regular expression against input starting at position start and ending at position end, with the given anchoring. Records the submatch boundaries in group, which is [start, end) pairs of byte offsets. The number of boundaries needed is inferred from the size of the group array. It is most efficient not to ask for submatch boundaries.- Parameters:
input- the input byte arraystart- the beginning position in the inputend- the end position in the inputanchor- the anchoring flag (UNANCHORED, ANCHOR_START, ANCHOR_BOTH)group- the array to fill with submatch positionsngroup- the number of array pairs to fill in- Returns:
- true if a match was found
-
matchUTF8
boolean matchUTF8(byte[] b) Returns true iff this regexp matches the UTF-8 byte arrayb. -
match
Returns true iff textual regular expressionpatternmatches strings.More complicated queries need to use
compile(String)and the fullRE2interface.- Throws:
PatternSyntaxException
-
replaceAll
-
replaceFirst
-
replaceAllFunc
Returns a copy ofsrcin which at mostmaxReplacesmatches for this regexp have been replaced by the return value of of functionrepl(whose first argument is the matched string). No support is provided for expressions (e.g.\1or$1) in the replacement string. -
quoteMeta
-
pad
private int[] pad(int[] a) -
allMatches
-
findUTF8
byte[] findUTF8(byte[] b) Returns an array holding the text of the leftmost match inbof this regular expression.A return value of null indicates no match.
-
findUTF8Index
int[] findUTF8Index(byte[] b) Returns a two-element array of integers defining the location of the leftmost match inbof this regular expression. The match itself is atb[loc[0]...loc[1]].A return value of null indicates no match.
-
find
Returns a string holding the text of the leftmost match insof this regular expression.If there is no match, the return value is an empty string, but it will also be empty if the regular expression successfully matches an empty string. Use
findIndex(String)orfindSubmatch(String)if it is necessary to distinguish these cases. -
findIndex
Returns a two-element array of integers defining the location of the leftmost match insof this regular expression. The match itself is ats.substring(loc[0], loc[1]).A return value of null indicates no match.
-
findUTF8Submatch
byte[][] findUTF8Submatch(byte[] b) Returns an array of arrays the text of the leftmost match of the regular expression inband the matches, if any, of its subexpressions, as defined by the Submatch description above.A return value of null indicates no match.
-
findUTF8SubmatchIndex
-
findSubmatch
-
findSubmatchIndex
-
findAllUTF8
findAllUTF8()is the All version offindUTF8(byte[]); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.A return value of null indicates no match. TODO(adonovan): think about defining a byte slice view class, like a read-only Go slice backed by |b|.
-
findAllUTF8Index
findAllUTF8Indexis the All version offindUTF8Index(byte[]); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.A return value of null indicates no match.
-
findAll
-
findAllIndex
findAllIndexis the All version offindIndex(String); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.A return value of null indicates no match.
-
findAllUTF8Submatch
findAllUTF8Submatchis the All version offindUTF8Submatch(byte[]); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.A return value of null indicates no match.
-
findAllUTF8SubmatchIndex
findAllUTF8SubmatchIndexis the All version offindUTF8SubmatchIndex(byte[]); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.A return value of null indicates no match.
-
findAllSubmatch
-
findAllSubmatchIndex
findAllSubmatchIndexis the All version offindSubmatchIndex(String); it returns a list of up tonsuccessive matches of the expression, as defined by the All description above.A return value of null indicates no match.
-