Class Perl5Compiler
- java.lang.Object
-
- org.apache.oro.text.regex.Perl5Compiler
-
- All Implemented Interfaces:
PatternCompiler
public final class Perl5Compiler extends java.lang.Object implements PatternCompiler
The Perl5Compiler class is used to create compiled regular expressions conforming to the Perl5 regular expression syntax. It generates Perl5Pattern instances upon compilation to be used in conjunction with a Perl5Matcher instance. Please see the user's guide for more information about Perl5 regular expressions.Perl5Compiler and Perl5Matcher are designed with the intent that you use a separate instance of each per thread to avoid the overhead of both synchronization and concurrent access (e.g., a match that takes a long time in one thread will block the progress of another thread with a shorter match). If you want to use a single instance of each in a concurrent program, you must appropriately protect access to the instances with critical sections. If you want to share Perl5Pattern instances between concurrently executing instances of Perl5Matcher, you must compile the patterns with
READ_ONLY_MASK.- Since:
- 1.0
- See Also:
PatternCompiler,MalformedPatternException,Perl5Pattern,Perl5Matcher
-
-
Field Summary
Fields Modifier and Type Field Description private static char__CASE_INSENSITIVEprivate int__costprivate static char__EXTENDEDprivate static char__GLOBALprivate static java.util.HashMap<java.lang.String,java.lang.Character>__hashPOSIXLookup table for POSIX character class namesprivate static java.lang.String__HEX_DIGITprivate CharStringPointer__inputprivate static char__KEEPprivate char[]__modifierFlagsprivate static char__MULTILINEprivate static int__NONNULLprivate int__numParenthesesprivate char[]__programprivate int__programSizeprivate static char__READ_ONLYprivate boolean__sawBackreferenceprivate static int__SIMPLEprivate static char__SINGLELINEprivate static int__SPSTARTprivate static int__TRYAGAINprivate static int__WORSTCASEstatic intCASE_INSENSITIVE_MASKA mask passed as an option to thecompilemethods to indicate a compiled regular expression should be case insensitive.static intDEFAULT_MASKThe default mask for thecompilemethods.static intEXTENDED_MASKA mask passed as an option to thecompilemethods to indicate a compiled regular expression should be treated as a Perl5 extended pattern (i.e., a pattern using the /x modifier).static intMULTILINE_MASKA mask passed as an option to thecompilemethods to indicate a compiled regular expression should treat input as having multiple lines.static intREAD_ONLY_MASKA mask passed as an option to thecompilemethods to indicate that the resulting Perl5Pattern should be treated as a read only data structure by Perl5Matcher, making it safe to share a single Perl5Pattern instance among multiple threads without needing synchronization.static intSINGLELINE_MASKA mask passed as an option to thecompilemethods to indicate a compiled regular expression should treat input as being a single line.
-
Constructor Summary
Constructors Constructor Description Perl5Compiler()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private int__emitArgNode(char operator, char arg)private void__emitCode(char code)private int__emitNode(char operator)private char__getNextChar()private static boolean__isComplexRepetitionOp(char[] ch, int offset)private static boolean__isSimpleRepetitionOp(char ch)private int__parseAlternation(int[] retFlags)private int__parseAtom(int[] retFlags)private int__parseBranch(int[] retFlags)private int__parseExpression(boolean isParenthesized, int[] hintFlags)private static int__parseHex(char[] str, int off, int ml, int[] scanned)private static int__parseOctal(char[] str, int off, int ml, int[] scanned)private char__parsePOSIX(boolean[] negFlag)Parse POSIX epxression like [:foo:].private static boolean__parseRepetition(char[] str, int off)private int__parseUnicodeClass()private void__programAddOperatorTail(int current, int value)private void__programAddTail(int current, int value)private void__programInsertOperator(char operator, int oper)private static void__setModifierFlag(char[] flags, char ch)Patterncompile(char[] pattern)Same as calling compile(pattern, Perl5Compiler.DEFAULT_MASK);Patterncompile(char[] pattern, int options)Compiles a Perl5 regular expression into a Perl5Pattern instance that can be used by a Perl5Matcher object to perform pattern matching.Patterncompile(java.lang.String pattern)Same as calling compile(pattern, Perl5Compiler.DEFAULT_MASK);Patterncompile(java.lang.String pattern, int options)Compiles a Perl5 regular expression into a Perl5Pattern instance that can be used by a Perl5Matcher object to perform pattern matching.static java.lang.Stringquotemeta(char[] expression)Given a character string, returns a Perl5 expression that interprets each character of the original string literally.static java.lang.Stringquotemeta(java.lang.String expression)Given a character string, returns a Perl5 expression that interprets each character of the original string literally.
-
-
-
Field Detail
-
__WORSTCASE
private static final int __WORSTCASE
- See Also:
- Constant Field Values
-
__NONNULL
private static final int __NONNULL
- See Also:
- Constant Field Values
-
__SIMPLE
private static final int __SIMPLE
- See Also:
- Constant Field Values
-
__SPSTART
private static final int __SPSTART
- See Also:
- Constant Field Values
-
__TRYAGAIN
private static final int __TRYAGAIN
- See Also:
- Constant Field Values
-
__CASE_INSENSITIVE
private static final char __CASE_INSENSITIVE
- See Also:
- Constant Field Values
-
__GLOBAL
private static final char __GLOBAL
- See Also:
- Constant Field Values
-
__KEEP
private static final char __KEEP
- See Also:
- Constant Field Values
-
__MULTILINE
private static final char __MULTILINE
- See Also:
- Constant Field Values
-
__SINGLELINE
private static final char __SINGLELINE
- See Also:
- Constant Field Values
-
__EXTENDED
private static final char __EXTENDED
- See Also:
- Constant Field Values
-
__READ_ONLY
private static final char __READ_ONLY
- See Also:
- Constant Field Values
-
__HEX_DIGIT
private static final java.lang.String __HEX_DIGIT
- See Also:
- Constant Field Values
-
__input
private CharStringPointer __input
-
__sawBackreference
private boolean __sawBackreference
-
__modifierFlags
private final char[] __modifierFlags
-
__numParentheses
private int __numParentheses
-
__programSize
private int __programSize
-
__cost
private int __cost
-
__program
private char[] __program
-
__hashPOSIX
private static final java.util.HashMap<java.lang.String,java.lang.Character> __hashPOSIX
Lookup table for POSIX character class names
-
DEFAULT_MASK
public static final int DEFAULT_MASK
The default mask for thecompilemethods. It is equal to 0. The default behavior is for a regular expression to be case sensitive and to not specify if it is multiline or singleline. When MULITLINE_MASK and SINGLINE_MASK are not defined, the ^, $, and . metacharacters are interpreted according to the value of isMultiline() in Perl5Matcher. The default behavior of Perl5Matcher is to treat the Perl5Pattern as though MULTILINE_MASK were enabled. If isMultiline() returns false, then the pattern is treated as though SINGLINE_MASK were set. However, compiling a pattern with the MULTILINE_MASK or SINGLELINE_MASK masks will ALWAYS override whatever behavior is specified by the setMultiline() in Perl5Matcher.- See Also:
- Constant Field Values
-
CASE_INSENSITIVE_MASK
public static final int CASE_INSENSITIVE_MASK
A mask passed as an option to thecompilemethods to indicate a compiled regular expression should be case insensitive.- See Also:
- Constant Field Values
-
MULTILINE_MASK
public static final int MULTILINE_MASK
A mask passed as an option to thecompilemethods to indicate a compiled regular expression should treat input as having multiple lines. This option affects the interpretation of the ^ and $ metacharacters. When this mask is used, the ^ metacharacter matches at the beginning of every line, and the $ metacharacter matches at the end of every line. Additionally the . metacharacter will not match newlines when an expression is compiled with MULTILINE_MASK , which is its default behavior.- See Also:
- Constant Field Values
-
SINGLELINE_MASK
public static final int SINGLELINE_MASK
A mask passed as an option to thecompilemethods to indicate a compiled regular expression should treat input as being a single line. This option affects the interpretation of the ^ and $ metacharacters. When this mask is used, the ^ metacharacter matches at the beginning of the input, and the $ metacharacter matches at the end of the input. The ^ and $ metacharacters will not match at the beginning and end of lines occurring between the begnning and end of the input. Additionally, the . metacharacter will match newlines when an expression is compiled with SINGLELINE_MASK , unlike its default behavior.- See Also:
- Constant Field Values
-
EXTENDED_MASK
public static final int EXTENDED_MASK
A mask passed as an option to thecompilemethods to indicate a compiled regular expression should be treated as a Perl5 extended pattern (i.e., a pattern using the /x modifier). This option tells the compiler to ignore whitespace that is not backslashed or within a character class. It also tells the compiler to treat the # character as a metacharacter introducing a comment as in Perl. In other words, the # character will comment out any text in the regular expression between it and the next newline. The intent of this option is to allow you to divide your patterns into more readable parts. It is provided to maintain compatibility with Perl5 regular expressions, although it will not often make sense to use it in Java.- See Also:
- Constant Field Values
-
READ_ONLY_MASK
public static final int READ_ONLY_MASK
A mask passed as an option to thecompilemethods to indicate that the resulting Perl5Pattern should be treated as a read only data structure by Perl5Matcher, making it safe to share a single Perl5Pattern instance among multiple threads without needing synchronization. Without this option, Perl5Matcher reserves the right to store heuristic or other information in Perl5Pattern that might accelerate future matches. When you use this option, Perl5Matcher will not store or modify any information in a Perl5Pattern. Use this option when you want to share a Perl5Pattern instance among multiple threads using different Perl5Matcher instances.- See Also:
- Constant Field Values
-
-
Method Detail
-
quotemeta
public static final java.lang.String quotemeta(char[] expression)
Given a character string, returns a Perl5 expression that interprets each character of the original string literally. In other words, all special metacharacters are quoted/escaped. This method is useful for converting user input meant for literal interpretation into a safe regular expression representing the literal input.In effect, this method is the analog of the Perl5 quotemeta() builtin method.
- Parameters:
expression- The expression to convert.- Returns:
- A String containing a Perl5 regular expression corresponding to a literal interpretation of the pattern.
-
quotemeta
public static final java.lang.String quotemeta(java.lang.String expression)
Given a character string, returns a Perl5 expression that interprets each character of the original string literally. In other words, all special metacharacters are quoted/escaped. This method is useful for converting user input meant for literal interpretation into a safe regular expression representing the literal input.In effect, this method is the analog of the Perl5 quotemeta() builtin method.
- Parameters:
expression- Expression- Returns:
- A String containing a Perl5 regular expression corresponding to a literal interpretation of the pattern.
-
__isSimpleRepetitionOp
private static boolean __isSimpleRepetitionOp(char ch)
-
__isComplexRepetitionOp
private static boolean __isComplexRepetitionOp(char[] ch, int offset)
-
__parseRepetition
private static boolean __parseRepetition(char[] str, int off)
-
__parseHex
private static int __parseHex(char[] str, int off, int ml, int[] scanned)
-
__parseOctal
private static int __parseOctal(char[] str, int off, int ml, int[] scanned)
-
__setModifierFlag
private static void __setModifierFlag(char[] flags, char ch)
-
__emitCode
private void __emitCode(char code)
-
__emitNode
private int __emitNode(char operator)
-
__emitArgNode
private int __emitArgNode(char operator, char arg)
-
__programInsertOperator
private void __programInsertOperator(char operator, int oper)
-
__programAddTail
private void __programAddTail(int current, int value)
-
__programAddOperatorTail
private void __programAddOperatorTail(int current, int value)
-
__getNextChar
private char __getNextChar()
-
__parseAlternation
private int __parseAlternation(int[] retFlags) throws MalformedPatternException- Throws:
MalformedPatternException
-
__parseAtom
private int __parseAtom(int[] retFlags) throws MalformedPatternException- Throws:
MalformedPatternException
-
__parseUnicodeClass
private int __parseUnicodeClass() throws MalformedPatternException- Throws:
MalformedPatternException
-
__parsePOSIX
private char __parsePOSIX(boolean[] negFlag)
Parse POSIX epxression like [:foo:].- Returns:
- OpCode. return 0 when fail parsing POSIX expression.
-
__parseBranch
private int __parseBranch(int[] retFlags) throws MalformedPatternException- Throws:
MalformedPatternException
-
__parseExpression
private int __parseExpression(boolean isParenthesized, int[] hintFlags) throws MalformedPatternException- Throws:
MalformedPatternException
-
compile
public Pattern compile(char[] pattern, int options) throws MalformedPatternException
Compiles a Perl5 regular expression into a Perl5Pattern instance that can be used by a Perl5Matcher object to perform pattern matching. Please see the user's guide for more information about Perl5 regular expressions.- Specified by:
compilein interfacePatternCompiler- Parameters:
pattern- A Perl5 regular expression to compile.options- A set of flags giving the compiler instructions on how to treat the regular expression. The flags are a logical OR of any number of the five MASK constants. For example:regex = compiler.compile(pattern, Perl5Compiler.CASE_INSENSITIVE_MASK | Perl5Compiler.MULTILINE_MASK);This says to compile the pattern so that it treats input as consisting of multiple lines and to perform matches in a case insensitive manner.- Returns:
- A Pattern instance constituting the compiled regular expression. This instance will always be a Perl5Pattern and can be reliably casted to a Perl5Pattern.
- Throws:
MalformedPatternException- If the compiled expression is not a valid Perl5 regular expression.
-
compile
public Pattern compile(char[] pattern) throws MalformedPatternException
Same as calling compile(pattern, Perl5Compiler.DEFAULT_MASK);- Specified by:
compilein interfacePatternCompiler- Parameters:
pattern- A regular expression to compile.- Returns:
- A Pattern instance constituting the compiled regular expression. This instance will always be a Perl5Pattern and can be reliably casted to a Perl5Pattern.
- Throws:
MalformedPatternException- If the compiled expression is not a valid Perl5 regular expression.
-
compile
public Pattern compile(java.lang.String pattern) throws MalformedPatternException
Same as calling compile(pattern, Perl5Compiler.DEFAULT_MASK);- Specified by:
compilein interfacePatternCompiler- Parameters:
pattern- A regular expression to compile.- Returns:
- A Pattern instance constituting the compiled regular expression. This instance will always be a Perl5Pattern and can be reliably casted to a Perl5Pattern.
- Throws:
MalformedPatternException- If the compiled expression is not a valid Perl5 regular expression.
-
compile
public Pattern compile(java.lang.String pattern, int options) throws MalformedPatternException
Compiles a Perl5 regular expression into a Perl5Pattern instance that can be used by a Perl5Matcher object to perform pattern matching. Please see the user's guide for more information about Perl5 regular expressions.- Specified by:
compilein interfacePatternCompiler- Parameters:
pattern- A Perl5 regular expression to compile.options- A set of flags giving the compiler instructions on how to treat the regular expression. The flags are a logical OR of any number of the five MASK constants. For example:regex = compiler.compile("ˆ\\w+\\d+$", Perl5Compiler.CASE_INSENSITIVE_MASK | Perl5Compiler.MULTILINE_MASK);This says to compile the pattern so that it treats input as consisting of multiple lines and to perform matches in a case insensitive manner.- Returns:
- A Pattern instance constituting the compiled regular expression. This instance will always be a Perl5Pattern and can be reliably casted to a Perl5Pattern.
- Throws:
MalformedPatternException- If the compiled expression is not a valid Perl5 regular expression.
-
-