Class RE2

java.lang.Object
com.google.re2j.RE2

class RE2 extends Object
An RE2 class instance is a compiled representation of an RE2 regular expression, independent of the public Java-like Pattern/Matcher API.

This class also contains various implementation helpers for RE2 regular expressions.

Use the quoteMeta(String) utility function to quote all regular expression metacharacters in an arbitrary string.

See the Matcher and Pattern classes for the public API, and the package-level documentation for an overview of how to use this API.

  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    private static interface 
     
    (package private) static interface 
     
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    (package private) static final int
     
    (package private) static final int
     
    (package private) static final int
     
    (package private) final int
     
    (package private) static final int
     
    (package private) final String
    / RE2 instance members.
    (package private) static final int
    / Parser flags.
    (package private) static final int
     
    (package private) boolean
     
    (package private) static final int
     
     
    (package private) static final int
     
    (package private) final int
     
    (package private) static final int
     
    (package private) static final int
     
    (package private) static final int
     
    private final AtomicReference<Machine>
     
    (package private) static final int
     
    (package private) String
     
    (package private) boolean
     
    (package private) int
     
    (package private) byte[]
     
    (package private) final Prog
     
    (package private) static final int
    / Anchors
    (package private) static final int
     
    (package private) static final int
     
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    (package private)
    RE2(String expr)
     
    private
    RE2(String expr, Prog prog, int numSubexp, boolean longest)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    private void
    allMatches(MachineInput input, int n, RE2.DeliverFunc deliver)
     
    (package private) static RE2
    Parses a regular expression and returns, if successful, an RE2 instance that can be used to match against text.
    (package private) static RE2
    compileImpl(String expr, int mode, boolean longest)
     
    (package private) static RE2
    compilePOSIX is like compile(String) but restricts the regular expression to POSIX ERE (egrep) syntax and changes the match semantics to leftmost-longest.
    private int[]
    doExecute(MachineInput in, int pos, int anchor, int ncap)
     
    (package private) String
    Returns a string holding the text of the leftmost match in s of this regular expression.
    (package private) List<String>
    findAll(String s, int n)
    findAll is the All version of find(String); it returns a list of up to n successive matches of the expression, as defined by the All description above.
    (package private) List<int[]>
    findAllIndex(String s, int n)
    findAllIndex is the All version of findIndex(String); it returns a list of up to n successive matches of the expression, as defined by the All description above.
    (package private) List<String[]>
    findAllSubmatch is the All version of findSubmatch(String); it returns a list of up to n successive matches of the expression, as defined by the All description above.
    (package private) List<int[]>
    findAllSubmatchIndex is the All version of findSubmatchIndex(String); it returns a list of up to n successive matches of the expression, as defined by the All description above.
    (package private) List<byte[]>
    findAllUTF8(byte[] b, int n)
    findAllUTF8() is the All version of findUTF8(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.
    (package private) List<int[]>
    findAllUTF8Index(byte[] b, int n)
    findAllUTF8Index is the All version of findUTF8Index(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.
    (package private) List<byte[][]>
    findAllUTF8Submatch(byte[] b, int n)
    findAllUTF8Submatch is the All version of findUTF8Submatch(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.
    (package private) List<int[]>
    findAllUTF8SubmatchIndex(byte[] b, int n)
    findAllUTF8SubmatchIndex is the All version of findUTF8SubmatchIndex(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.
    (package private) int[]
    Returns a two-element array of integers defining the location of the leftmost match in s of this regular expression.
    (package private) String[]
    Returns an array of strings holding the text of the leftmost match of the regular expression in s and the matches, if any, of its subexpressions, as defined by the Submatch description above.
    (package private) int[]
    Returns an array holding the index pairs identifying the leftmost match of this regular expression in s and the matches, if any, of its subexpressions, as defined by the Submatch description above.
    (package private) byte[]
    findUTF8(byte[] b)
    Returns an array holding the text of the leftmost match in b of this regular expression.
    (package private) int[]
    findUTF8Index(byte[] b)
    Returns a two-element array of integers defining the location of the leftmost match in b of this regular expression.
    (package private) byte[][]
    findUTF8Submatch(byte[] b)
    Returns an array of arrays the text of the leftmost match of the regular expression in b and the matches, if any, of its subexpressions, as defined by the Submatch description above.
    (package private) int[]
    Returns an array holding the index pairs identifying the leftmost match of this regular expression in b and the matches, if any, of its subexpressions, as defined by the the Submatch and Index descriptions above.
    (package private) Machine
    get()
     
    (package private) boolean
    match(MatcherInput input, int start, int end, int anchor, int[] group, int ngroup)
    Matches the regular expression against input starting at position start and ending at position end, with the given anchoring.
    (package private) boolean
    Returns true iff this regexp matches the string s.
    (package private) boolean
    match(CharSequence input, int start, int end, int anchor, int[] group, int ngroup)
     
    (package private) static boolean
    match(String pattern, CharSequence s)
    Returns true iff textual regular expression pattern matches string s.
    (package private) boolean
    matchUTF8(byte[] b)
    Returns true iff this regexp matches the UTF-8 byte array b.
    (package private) int
    Returns the number of parenthesized subexpressions in this regular expression.
    private int[]
    pad(int[] a)
     
    (package private) void
    put(Machine m, boolean isNew)
     
    (package private) static String
    Returns a string that quotes all regular expression metacharacters inside the argument text; the returned string is a regular expression matching the literal text.
    (package private) String
    replaceAll(String src, String repl)
    Returns a copy of src in which all matches for this regexp have been replaced by repl.
    (package private) String
    replaceAllFunc(String src, RE2.ReplaceFunc repl, int maxReplaces)
    Returns a copy of src in which at most maxReplaces matches for this regexp have been replaced by the return value of of function repl (whose first argument is the matched string).
    (package private) String
    Returns a copy of src in which only the first match for this regexp has been replaced by repl.
    (package private) void
     
     

    Methods inherited from class Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

  • Constructor Details

    • RE2

      RE2(String expr)
    • RE2

      private RE2(String expr, Prog prog, int numSubexp, boolean longest)
  • Method Details

    • compile

      static RE2 compile(String expr) throws PatternSyntaxException
      Parses a regular expression and returns, if successful, an RE2 instance that can be used to match against text.

      When matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses the one that a backtracking search would have found first. This so-called leftmost-first matching is the same semantics that Perl, Python, and other implementations use, although this package implements it without the expense of backtracking. For POSIX leftmost-longest matching, see compilePOSIX(String).

      Throws:
      PatternSyntaxException
    • compilePOSIX

      static RE2 compilePOSIX(String expr) throws PatternSyntaxException
      compilePOSIX is like compile(String) but restricts the regular expression to POSIX ERE (egrep) syntax and changes the match semantics to leftmost-longest.

      That is, when matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses a match that is as long as possible. This so-called leftmost-longest matching is the same semantics that early regular expression implementations used and that POSIX specifies.

      However, there can be multiple leftmost-longest matches, with different submatch choices, and here this package diverges from POSIX. Among the possible leftmost-longest matches, this package chooses the one that a backtracking search would have found first, while POSIX specifies that the match be chosen to maximize the length of the first subexpression, then the second, and so on from left to right. The POSIX rule is computationally prohibitive and not even well-defined. See http://swtch.com/~rsc/regexp/regexp2.html#posix

      Throws:
      PatternSyntaxException
    • compileImpl

      static RE2 compileImpl(String expr, int mode, boolean longest) throws PatternSyntaxException
      Throws:
      PatternSyntaxException
    • numberOfCapturingGroups

      int numberOfCapturingGroups()
      Returns the number of parenthesized subexpressions in this regular expression.
    • get

      Machine get()
    • reset

      void reset()
    • put

      void put(Machine m, boolean isNew)
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • doExecute

      private int[] doExecute(MachineInput in, int pos, int anchor, int ncap)
    • match

      boolean match(CharSequence s)
      Returns true iff this regexp matches the string s.
    • match

      boolean match(CharSequence input, int start, int end, int anchor, int[] group, int ngroup)
    • match

      boolean match(MatcherInput input, int start, int end, int anchor, int[] group, int ngroup)
      Matches the regular expression against input starting at position start and ending at position end, with the given anchoring. Records the submatch boundaries in group, which is [start, end) pairs of byte offsets. The number of boundaries needed is inferred from the size of the group array. It is most efficient not to ask for submatch boundaries.
      Parameters:
      input - the input byte array
      start - the beginning position in the input
      end - the end position in the input
      anchor - the anchoring flag (UNANCHORED, ANCHOR_START, ANCHOR_BOTH)
      group - the array to fill with submatch positions
      ngroup - the number of array pairs to fill in
      Returns:
      true if a match was found
    • matchUTF8

      boolean matchUTF8(byte[] b)
      Returns true iff this regexp matches the UTF-8 byte array b.
    • match

      static boolean match(String pattern, CharSequence s) throws PatternSyntaxException
      Returns true iff textual regular expression pattern matches string s.

      More complicated queries need to use compile(String) and the full RE2 interface.

      Throws:
      PatternSyntaxException
    • replaceAll

      String replaceAll(String src, String repl)
      Returns a copy of src in which all matches for this regexp have been replaced by repl. No support is provided for expressions (e.g. \1 or $1) in the replacement string.
    • replaceFirst

      String replaceFirst(String src, String repl)
      Returns a copy of src in which only the first match for this regexp has been replaced by repl. No support is provided for expressions (e.g. \1 or $1) in the replacement string.
    • replaceAllFunc

      String replaceAllFunc(String src, RE2.ReplaceFunc repl, int maxReplaces)
      Returns a copy of src in which at most maxReplaces matches for this regexp have been replaced by the return value of of function repl (whose first argument is the matched string). No support is provided for expressions (e.g. \1 or $1) in the replacement string.
    • quoteMeta

      static String quoteMeta(String s)
      Returns a string that quotes all regular expression metacharacters inside the argument text; the returned string is a regular expression matching the literal text. For example, quoteMeta("[foo]").equals("\\[foo\\]").
    • pad

      private int[] pad(int[] a)
    • allMatches

      private void allMatches(MachineInput input, int n, RE2.DeliverFunc deliver)
    • findUTF8

      byte[] findUTF8(byte[] b)
      Returns an array holding the text of the leftmost match in b of this regular expression.

      A return value of null indicates no match.

    • findUTF8Index

      int[] findUTF8Index(byte[] b)
      Returns a two-element array of integers defining the location of the leftmost match in b of this regular expression. The match itself is at b[loc[0]...loc[1]].

      A return value of null indicates no match.

    • find

      String find(String s)
      Returns a string holding the text of the leftmost match in s of this regular expression.

      If there is no match, the return value is an empty string, but it will also be empty if the regular expression successfully matches an empty string. Use findIndex(String) or findSubmatch(String) if it is necessary to distinguish these cases.

    • findIndex

      int[] findIndex(String s)
      Returns a two-element array of integers defining the location of the leftmost match in s of this regular expression. The match itself is at s.substring(loc[0], loc[1]).

      A return value of null indicates no match.

    • findUTF8Submatch

      byte[][] findUTF8Submatch(byte[] b)
      Returns an array of arrays the text of the leftmost match of the regular expression in b and the matches, if any, of its subexpressions, as defined by the Submatch description above.

      A return value of null indicates no match.

    • findUTF8SubmatchIndex

      int[] findUTF8SubmatchIndex(byte[] b)
      Returns an array holding the index pairs identifying the leftmost match of this regular expression in b and the matches, if any, of its subexpressions, as defined by the the Submatch and Index descriptions above.

      A return value of null indicates no match.

    • findSubmatch

      String[] findSubmatch(String s)
      Returns an array of strings holding the text of the leftmost match of the regular expression in s and the matches, if any, of its subexpressions, as defined by the Submatch description above.

      A return value of null indicates no match.

    • findSubmatchIndex

      int[] findSubmatchIndex(String s)
      Returns an array holding the index pairs identifying the leftmost match of this regular expression in s and the matches, if any, of its subexpressions, as defined by the Submatch description above.

      A return value of null indicates no match.

    • findAllUTF8

      List<byte[]> findAllUTF8(byte[] b, int n)
      findAllUTF8() is the All version of findUTF8(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.

      A return value of null indicates no match. TODO(adonovan): think about defining a byte slice view class, like a read-only Go slice backed by |b|.

    • findAllUTF8Index

      List<int[]> findAllUTF8Index(byte[] b, int n)
      findAllUTF8Index is the All version of findUTF8Index(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.

      A return value of null indicates no match.

    • findAll

      List<String> findAll(String s, int n)
      findAll is the All version of find(String); it returns a list of up to n successive matches of the expression, as defined by the All description above.

      A return value of null indicates no match.

    • findAllIndex

      List<int[]> findAllIndex(String s, int n)
      findAllIndex is the All version of findIndex(String); it returns a list of up to n successive matches of the expression, as defined by the All description above.

      A return value of null indicates no match.

    • findAllUTF8Submatch

      List<byte[][]> findAllUTF8Submatch(byte[] b, int n)
      findAllUTF8Submatch is the All version of findUTF8Submatch(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.

      A return value of null indicates no match.

    • findAllUTF8SubmatchIndex

      List<int[]> findAllUTF8SubmatchIndex(byte[] b, int n)
      findAllUTF8SubmatchIndex is the All version of findUTF8SubmatchIndex(byte[]); it returns a list of up to n successive matches of the expression, as defined by the All description above.

      A return value of null indicates no match.

    • findAllSubmatch

      List<String[]> findAllSubmatch(String s, int n)
      findAllSubmatch is the All version of findSubmatch(String); it returns a list of up to n successive matches of the expression, as defined by the All description above.

      A return value of null indicates no match.

    • findAllSubmatchIndex

      List<int[]> findAllSubmatchIndex(String s, int n)
      findAllSubmatchIndex is the All version of findSubmatchIndex(String); it returns a list of up to n successive matches of the expression, as defined by the All description above.

      A return value of null indicates no match.