Class RE2


  • class RE2
    extends java.lang.Object
    An RE2 class instance is a compiled representation of an RE2 regular expression, independent of the public Java-like Pattern/Matcher API.

    This class also contains various implementation helpers for RE2 regular expressions.

    Use the quoteMeta(String) utility function to quote all regular expression metacharacters in an arbitrary string.

    See the Matcher and Pattern classes for the public API, and the package-level documentation for an overview of how to use this API.

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      private static interface  RE2.DeliverFunc  
      (package private) static interface  RE2.ReplaceFunc  
    • Field Summary

      Fields 
      Modifier and Type Field Description
      (package private) static int ANCHOR_BOTH  
      (package private) static int ANCHOR_START  
      (package private) static int CLASS_NL  
      (package private) int cond  
      (package private) static int DOT_NL  
      (package private) java.lang.String expr  
      (package private) static int FOLD_CASE  
      (package private) static int LITERAL  
      (package private) boolean longest  
      (package private) static int MATCH_NL  
      java.util.Map<java.lang.String,​java.lang.Integer> namedGroups  
      (package private) static int NON_GREEDY  
      (package private) int numSubexp  
      (package private) static int ONE_LINE  
      (package private) static int PERL  
      (package private) static int PERL_X  
      private java.util.concurrent.atomic.AtomicReference<Machine> pooled  
      (package private) static int POSIX  
      (package private) java.lang.String prefix  
      (package private) boolean prefixComplete  
      (package private) int prefixRune  
      (package private) byte[] prefixUTF8  
      (package private) Prog prog  
      (package private) static int UNANCHORED  
      (package private) static int UNICODE_GROUPS  
      (package private) static int WAS_DOLLAR  
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      (package private) RE2​(java.lang.String expr)  
      private RE2​(java.lang.String expr, Prog prog, int numSubexp, boolean longest)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private void allMatches​(MachineInput input, int n, RE2.DeliverFunc deliver)  
      (package private) static RE2 compile​(java.lang.String expr)
      Parses a regular expression and returns, if successful, an RE2 instance that can be used to match against text.
      (package private) static RE2 compileImpl​(java.lang.String expr, int mode, boolean longest)  
      (package private) static RE2 compilePOSIX​(java.lang.String expr)
      compilePOSIX is like compile(java.lang.String) but restricts the regular expression to POSIX ERE (egrep) syntax and changes the match semantics to leftmost-longest.
      private int[] doExecute​(MachineInput in, int pos, int anchor, int ncap)  
      (package private) java.lang.String find​(java.lang.String s)
      Returns a string holding the text of the leftmost match in s of this regular expression.
      (package private) java.util.List<java.lang.String> findAll​(java.lang.String s, int n)
      findAll is the All version of find(java.lang.String); it returns a list of up to n successive matches of the expression, as defined by the All description above.
      (package private) java.util.List<int[]> findAllIndex​(java.lang.String s, int n)
      findAllIndex is the All version of findIndex(java.lang.String); it returns a list of up to n successive matches of the expression, as defined by the All description above.
      (package private) java.util.List<java.lang.String[]> findAllSubmatch​(java.lang.String s, int n)
      findAllSubmatch is the All version of findSubmatch(java.lang.String); it returns a list of up to n successive matches of the expression, as defined by the All description above.
      (package private) java.util.List<int[]> findAllSubmatchIndex​(java.lang.String s, int n)
      findAllSubmatchIndex is the All version of findSubmatchIndex(java.lang.String); it returns a list of up to n successive matches of the expression, as defined by the All description above.
      (package private) java.util.List<byte[]> findAllUTF8​(byte[] b, int n)
      findAllUTF8() is the All version of findUTF8(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.
      (package private) java.util.List<int[]> findAllUTF8Index​(byte[] b, int n)
      findAllUTF8Index is the All version of findUTF8Index(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.
      (package private) java.util.List<byte[][]> findAllUTF8Submatch​(byte[] b, int n)
      findAllUTF8Submatch is the All version of findUTF8Submatch(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.
      (package private) java.util.List<int[]> findAllUTF8SubmatchIndex​(byte[] b, int n)
      findAllUTF8SubmatchIndex is the All version of findUTF8SubmatchIndex(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.
      (package private) int[] findIndex​(java.lang.String s)
      Returns a two-element array of integers defining the location of the leftmost match in s of this regular expression.
      (package private) java.lang.String[] findSubmatch​(java.lang.String s)
      Returns an array of strings holding the text of the leftmost match of the regular expression in s and the matches, if any, of its subexpressions, as defined by the Submatch description above.
      (package private) int[] findSubmatchIndex​(java.lang.String s)
      Returns an array holding the index pairs identifying the leftmost match of this regular expression in s and the matches, if any, of its subexpressions, as defined by the Submatch description above.
      (package private) byte[] findUTF8​(byte[] b)
      Returns an array holding the text of the leftmost match in b of this regular expression.
      (package private) int[] findUTF8Index​(byte[] b)
      Returns a two-element array of integers defining the location of the leftmost match in b of this regular expression.
      (package private) byte[][] findUTF8Submatch​(byte[] b)
      Returns an array of arrays the text of the leftmost match of the regular expression in b and the matches, if any, of its subexpressions, as defined by the Submatch description above.
      (package private) int[] findUTF8SubmatchIndex​(byte[] b)
      Returns an array holding the index pairs identifying the leftmost match of this regular expression in b and the matches, if any, of its subexpressions, as defined by the the Submatch and Index descriptions above.
      (package private) Machine get()  
      (package private) boolean match​(MatcherInput input, int start, int end, int anchor, int[] group, int ngroup)
      Matches the regular expression against input starting at position start and ending at position end, with the given anchoring.
      (package private) boolean match​(java.lang.CharSequence s)
      Returns true iff this regexp matches the string s.
      (package private) boolean match​(java.lang.CharSequence input, int start, int end, int anchor, int[] group, int ngroup)  
      (package private) static boolean match​(java.lang.String pattern, java.lang.CharSequence s)
      Returns true iff textual regular expression pattern matches string s.
      (package private) boolean matchUTF8​(byte[] b)
      Returns true iff this regexp matches the UTF-8 byte array b.
      (package private) int numberOfCapturingGroups()
      Returns the number of parenthesized subexpressions in this regular expression.
      private int[] pad​(int[] a)  
      (package private) void put​(Machine m, boolean isNew)  
      (package private) static java.lang.String quoteMeta​(java.lang.String s)
      Returns a string that quotes all regular expression metacharacters inside the argument text; the returned string is a regular expression matching the literal text.
      (package private) java.lang.String replaceAll​(java.lang.String src, java.lang.String repl)
      Returns a copy of src in which all matches for this regexp have been replaced by repl.
      (package private) java.lang.String replaceAllFunc​(java.lang.String src, RE2.ReplaceFunc repl, int maxReplaces)
      Returns a copy of src in which at most maxReplaces matches for this regexp have been replaced by the return value of of function repl (whose first argument is the matched string).
      (package private) java.lang.String replaceFirst​(java.lang.String src, java.lang.String repl)
      Returns a copy of src in which only the first match for this regexp has been replaced by repl.
      (package private) void reset()  
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • RE2

        RE2​(java.lang.String expr)
      • RE2

        private RE2​(java.lang.String expr,
                    Prog prog,
                    int numSubexp,
                    boolean longest)
    • Method Detail

      • compile

        static RE2 compile​(java.lang.String expr)
                    throws PatternSyntaxException
        Parses a regular expression and returns, if successful, an RE2 instance that can be used to match against text.

        When matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses the one that a backtracking search would have found first. This so-called leftmost-first matching is the same semantics that Perl, Python, and other implementations use, although this package implements it without the expense of backtracking. For POSIX leftmost-longest matching, see compilePOSIX(java.lang.String).

        Throws:
        PatternSyntaxException
      • compilePOSIX

        static RE2 compilePOSIX​(java.lang.String expr)
                         throws PatternSyntaxException
        compilePOSIX is like compile(java.lang.String) but restricts the regular expression to POSIX ERE (egrep) syntax and changes the match semantics to leftmost-longest.

        That is, when matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses a match that is as long as possible. This so-called leftmost-longest matching is the same semantics that early regular expression implementations used and that POSIX specifies.

        However, there can be multiple leftmost-longest matches, with different submatch choices, and here this package diverges from POSIX. Among the possible leftmost-longest matches, this package chooses the one that a backtracking search would have found first, while POSIX specifies that the match be chosen to maximize the length of the first subexpression, then the second, and so on from left to right. The POSIX rule is computationally prohibitive and not even well-defined. See http://swtch.com/~rsc/regexp/regexp2.html#posix

        Throws:
        PatternSyntaxException
      • numberOfCapturingGroups

        int numberOfCapturingGroups()
        Returns the number of parenthesized subexpressions in this regular expression.
      • reset

        void reset()
      • put

        void put​(Machine m,
                 boolean isNew)
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • doExecute

        private int[] doExecute​(MachineInput in,
                                int pos,
                                int anchor,
                                int ncap)
      • match

        boolean match​(java.lang.CharSequence s)
        Returns true iff this regexp matches the string s.
      • match

        boolean match​(java.lang.CharSequence input,
                      int start,
                      int end,
                      int anchor,
                      int[] group,
                      int ngroup)
      • match

        boolean match​(MatcherInput input,
                      int start,
                      int end,
                      int anchor,
                      int[] group,
                      int ngroup)
        Matches the regular expression against input starting at position start and ending at position end, with the given anchoring. Records the submatch boundaries in group, which is [start, end) pairs of byte offsets. The number of boundaries needed is inferred from the size of the group array. It is most efficient not to ask for submatch boundaries.
        Parameters:
        input - the input byte array
        start - the beginning position in the input
        end - the end position in the input
        anchor - the anchoring flag (UNANCHORED, ANCHOR_START, ANCHOR_BOTH)
        group - the array to fill with submatch positions
        ngroup - the number of array pairs to fill in
        Returns:
        true if a match was found
      • matchUTF8

        boolean matchUTF8​(byte[] b)
        Returns true iff this regexp matches the UTF-8 byte array b.
      • replaceAll

        java.lang.String replaceAll​(java.lang.String src,
                                    java.lang.String repl)
        Returns a copy of src in which all matches for this regexp have been replaced by repl. No support is provided for expressions (e.g. \1 or $1) in the replacement string.
      • replaceFirst

        java.lang.String replaceFirst​(java.lang.String src,
                                      java.lang.String repl)
        Returns a copy of src in which only the first match for this regexp has been replaced by repl. No support is provided for expressions (e.g. \1 or $1) in the replacement string.
      • replaceAllFunc

        java.lang.String replaceAllFunc​(java.lang.String src,
                                        RE2.ReplaceFunc repl,
                                        int maxReplaces)
        Returns a copy of src in which at most maxReplaces matches for this regexp have been replaced by the return value of of function repl (whose first argument is the matched string). No support is provided for expressions (e.g. \1 or $1) in the replacement string.
      • quoteMeta

        static java.lang.String quoteMeta​(java.lang.String s)
        Returns a string that quotes all regular expression metacharacters inside the argument text; the returned string is a regular expression matching the literal text. For example, quoteMeta("[foo]").equals("\\[foo\\]").
      • pad

        private int[] pad​(int[] a)
      • findUTF8

        byte[] findUTF8​(byte[] b)
        Returns an array holding the text of the leftmost match in b of this regular expression.

        A return value of null indicates no match.

      • findUTF8Index

        int[] findUTF8Index​(byte[] b)
        Returns a two-element array of integers defining the location of the leftmost match in b of this regular expression. The match itself is at b[loc[0]...loc[1]].

        A return value of null indicates no match.

      • find

        java.lang.String find​(java.lang.String s)
        Returns a string holding the text of the leftmost match in s of this regular expression.

        If there is no match, the return value is an empty string, but it will also be empty if the regular expression successfully matches an empty string. Use findIndex(java.lang.String) or findSubmatch(java.lang.String) if it is necessary to distinguish these cases.

      • findIndex

        int[] findIndex​(java.lang.String s)
        Returns a two-element array of integers defining the location of the leftmost match in s of this regular expression. The match itself is at s.substring(loc[0], loc[1]).

        A return value of null indicates no match.

      • findUTF8Submatch

        byte[][] findUTF8Submatch​(byte[] b)
        Returns an array of arrays the text of the leftmost match of the regular expression in b and the matches, if any, of its subexpressions, as defined by the Submatch description above.

        A return value of null indicates no match.

      • findUTF8SubmatchIndex

        int[] findUTF8SubmatchIndex​(byte[] b)
        Returns an array holding the index pairs identifying the leftmost match of this regular expression in b and the matches, if any, of its subexpressions, as defined by the the Submatch and Index descriptions above.

        A return value of null indicates no match.

      • findSubmatch

        java.lang.String[] findSubmatch​(java.lang.String s)
        Returns an array of strings holding the text of the leftmost match of the regular expression in s and the matches, if any, of its subexpressions, as defined by the Submatch description above.

        A return value of null indicates no match.

      • findSubmatchIndex

        int[] findSubmatchIndex​(java.lang.String s)
        Returns an array holding the index pairs identifying the leftmost match of this regular expression in s and the matches, if any, of its subexpressions, as defined by the Submatch description above.

        A return value of null indicates no match.

      • findAllUTF8

        java.util.List<byte[]> findAllUTF8​(byte[] b,
                                           int n)
        findAllUTF8() is the All version of findUTF8(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.

        A return value of null indicates no match. TODO(adonovan): think about defining a byte slice view class, like a read-only Go slice backed by |b|.

      • findAllUTF8Index

        java.util.List<int[]> findAllUTF8Index​(byte[] b,
                                               int n)
        findAllUTF8Index is the All version of findUTF8Index(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.

        A return value of null indicates no match.

      • findAll

        java.util.List<java.lang.String> findAll​(java.lang.String s,
                                                 int n)
        findAll is the All version of find(java.lang.String); it returns a list of up to n successive matches of the expression, as defined by the All description above.

        A return value of null indicates no match.

      • findAllIndex

        java.util.List<int[]> findAllIndex​(java.lang.String s,
                                           int n)
        findAllIndex is the All version of findIndex(java.lang.String); it returns a list of up to n successive matches of the expression, as defined by the All description above.

        A return value of null indicates no match.

      • findAllUTF8Submatch

        java.util.List<byte[][]> findAllUTF8Submatch​(byte[] b,
                                                     int n)
        findAllUTF8Submatch is the All version of findUTF8Submatch(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.

        A return value of null indicates no match.

      • findAllUTF8SubmatchIndex

        java.util.List<int[]> findAllUTF8SubmatchIndex​(byte[] b,
                                                       int n)
        findAllUTF8SubmatchIndex is the All version of findUTF8SubmatchIndex(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.

        A return value of null indicates no match.

      • findAllSubmatch

        java.util.List<java.lang.String[]> findAllSubmatch​(java.lang.String s,
                                                           int n)
        findAllSubmatch is the All version of findSubmatch(java.lang.String); it returns a list of up to n successive matches of the expression, as defined by the All description above.

        A return value of null indicates no match.

      • findAllSubmatchIndex

        java.util.List<int[]> findAllSubmatchIndex​(java.lang.String s,
                                                   int n)
        findAllSubmatchIndex is the All version of findSubmatchIndex(java.lang.String); it returns a list of up to n successive matches of the expression, as defined by the All description above.

        A return value of null indicates no match.