Class Parser<T>
- Direct Known Subclasses:
BestParser,DelimitedParser,EmptyListParser,NestableBlockCommentScanner,ReluctantBetweenParser,RepeatAtLeastParser,RepeatTimesParser,SkipAtLeastParser,SkipTimesParser
Parser takes as input a
CharSequence source and parses it when the parse(CharSequence) method is called.
A value of type T will be returned if parsing succeeds, or a ParserException
is thrown to indicate parsing error. For example:
Parser<String> scanner = Scanners.IDENTIFIER;
assertEquals("foo", scanner.parse("foo"));
Parsers run either on character level to scan the source, or on token level to parse
a list of Token objects returned from another parser. This other parser that returns the
list of tokens for token level parsing is hooked up via the from(Parser, Parser)
or from(Parser) method.
The following are important naming conventions used throughout the library:
- A character level parser object that recognizes a single lexical word is called a scanner.
- A scanner that translates the recognized lexical word into a token is called a tokenizer.
- A character level parser object that does lexical analysis and returns a list of
Tokenis called a lexer. - All
indexparameters are 0-based indexes in the original source.
Parser.Mode.DEBUG mode to
parse(CharSequence, Mode) and inspect the result in
ParserException.getParseTree(). All labeled parsers will generate a node
in the exception's parse tree, with matched indices in the source.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumDefines the mode that a parser should be run in.static final classAn atomic mutable reference toParserused in recursive grammars.private static final class -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) abstract booleanapply(ParseContext ctxt) private static <T> TapplyInfixOperators(T initialValue, List<? extends Function<? super T, ? extends T>> functions) private static <T> TapplyInfixrOperators(T first, List<Parser.Rhs<T>> rhss) private static <T> TapplyPostfixOperators(T a, Iterable<? extends Function<? super T, ? extends T>> ms) private static <T> TapplyPrefixOperators(List<? extends Function<? super T, ? extends T>> ms, T a) As a delimiter, the parser's error is considered lenient and will only be reported if no other meaningful error is encountered.p.asOptional()is equivalent top?in EBNF.atLeast(int min) atomic()AParserthat undoes any partial match ifthisfails.final <R> Parser<R> cast()fails()followedBy(Parser<?> parser) AParserthat takes as input the tokens returned bytokenizerdelimited bydelim, and runsthisto parse the tokens.from(Parser<? extends Collection<Token>> lexer) (package private) final TgetReturn(ParseContext ctxt) final <R> Parser<R> ifelse(Function<? super T, ? extends Parser<? extends R>> consequence, Parser<? extends R> alternative) final <R> Parser<R> AParserfor left-associative infix operator.AParserthat parses non-associative infix operator.AParserfor right-associative infix operator.AParserthat greedily runsthisrepeatedly, and ignores the pattern recognized bydelimbefore and after each occurrence.many()p.many()is equivalent top*in EBNF.many1()p.many1()is equivalent top+in EBNF.final <R> Parser<R> static <T> Parser.Reference<T> Creates a new instance ofParser.Reference.final <To> Parser<To> AParserthat executesthis, maps the result usingmapto anotherParserobject to be executed as the next step.final <R> Parser<R> final Parser<?> not()AParserthat fails ifthissucceeds.final Parser<?> AParserthat fails ifthissucceeds.notFollowedBy(Parser<?> parser) optional()Deprecated.since 3.0.p1.or(p2)is equivalent top1 | p2in EBNF.a.otherwise(fallback)runsfallbackwhenamatches zero input.final Tparse(CharSequence source) Parsessource.final Tparse(CharSequence source, String moduleName) Deprecated.Please useparse(CharSequence)instead.final Tparse(CharSequence source, Parser.Mode mode) Parsessourceunder the givenmode.final TParses source read fromreadable.final TDeprecated.Please useparse(Readable)instead.final ParseTreeparseTree(CharSequence source) Parsessourceand returns aParseTreecorresponding to the syntactical structure of the input.peek()AParserthat runsthisand undoes any input consumption if succeeds.(package private) static StringBuilderCopies all content fromfromtoto.reluctantBetween(Parser<?> before, Parser<?> after) Deprecated.This method probably only works in the simplest cases.final <R> Parser<R> retn(R value) skipAtLeast(int min) skipMany()p.skipMany()is equivalent top*in EBNF.p.skipMany1()is equivalent top+in EBNF.skipTimes(int n) skipTimes(int min, int max) AParserthat runsthisparser for at leastmintimes and up tomaxtimes, with all the return values ignored.source()AParserthat returns the matched string in the original source.succeeds()times(int n) times(int min, int max) token()AParserthat matches this parser zero or many times until the given parser succeeds.final Parser<WithSource<T>> AParserthat returns both parsed object and matched string.
-
Constructor Details
-
Parser
Parser()
-
-
Method Details
-
newReference
Creates a new instance ofParser.Reference. Used when your grammar is recursive (many grammars are). -
retn
-
next
-
next
AParserthat executesthis, maps the result usingmapto anotherParserobject to be executed as the next step. -
until
AParserthat matches this parser zero or many times until the given parser succeeds. The input that matches the given parser will not be consumed. The input that matches this parser will be collected in a list that will be returned by this function.- Since:
- 2.2
-
followedBy
-
notFollowedBy
-
many
-
skipMany
p.skipMany()is equivalent top*in EBNF. The return values are discarded. -
many1
-
skipMany1
p.skipMany1()is equivalent top+in EBNF. The return values are discarded. -
atLeast
-
skipAtLeast
-
skipTimes
-
times
-
times
-
skipTimes
AParserthat runsthisparser for at leastmintimes and up tomaxtimes, with all the return values ignored. -
map
-
or
p1.or(p2)is equivalent top1 | p2in EBNF.- Parameters:
alternative- the alternative parser to run if this fails.
-
otherwise
a.otherwise(fallback)runsfallbackwhenamatches zero input. This is different froma.or(alternative)wherealternativeis run wheneverafails to match.One should usually use
or(org.jparsec.Parser<? extends T>).- Parameters:
fallback- the parser to run ifthismatches no input.- Since:
- 3.1
-
optional
Deprecated.since 3.0. Use} orinvalid @link
{@link #optional(null)asOptional()instead.p.optional()is equivalent top?in EBNF.nullis the result whenthisfails with no partial match. -
asOptional
p.asOptional()is equivalent top?in EBNF.Optional.empty()is the result whenthisfails with no partial match. Note thatOptionalprohibits nulls so make surethisdoes not result innull.- Since:
- 3.0
-
optional
-
not
AParserthat fails ifthissucceeds. Any input consumption is undone. -
not
AParserthat fails ifthissucceeds. Any input consumption is undone.- Parameters:
unexpected- the name of what we don't expect.
-
peek
AParserthat runsthisand undoes any input consumption if succeeds. -
atomic
AParserthat undoes any partial match ifthisfails. In other words, the parser either fully matches, or matches none. -
succeeds
-
fails
-
ifelse
-
ifelse
-
label
-
cast
Caststhisto aParserof typeR. Use it only if you know the parser actually returns value of typeR. -
between
AParserthat runsthisbetweenbeforeandafter. The return value ofthisis preserved.Equivalent to
Parsers.between(Parser, Parser, Parser), which preserves the natural order of the parsers in the argument list, but is a bit more verbose. -
reluctantBetween
Deprecated.This method probably only works in the simplest cases. And it's a character-level parser only. Use it at your own risk. It may be deleted later when we find a better way.AParserthat first runsbeforefrom the input start, then runsafterfrom the input's end, and only then runsthison what's left from the input. In effect,thisbehaves reluctantly, givingaftera chance to grab input that would have been consumed bythisotherwise. -
sepBy1
-
sepBy
-
endBy
-
endBy1
-
sepEndBy1
-
sepEndBy
-
prefix
-
postfix
AParserthat runsthisand then runsopfor 0 or more times greedily. TheFunctionobjects returned fromopare applied from left to right to the return value of p.This is the preferred API to avoid
StackOverflowErrorin left-recursive parsers. For example, to parse array types in the form of "T[]" or "T[][]", the following left recursive grammar will fail:
A correct implementation is:Terminals terms = Terminals.operators("[", "]"); Parser.Reference<Type> ref = Parser.newReference(); ref.set(Parsers.or(leafTypeParser, Parsers.sequence(ref.lazy(), terms.phrase("[", "]"), new Unary<Type>() {...}))); return ref.get();
A not-so-obvious example, is to parse theTerminals terms = Terminals.operators("[", "]"); return leafTypeParer.postfix(terms.phrase("[", "]").retn(new Unary<Type>() {...}));expr ? a : bternary operator. It too is a left recursive grammar. And un-intuitively it can also be thought as a postfix operator. Basically, we can parse "? a : b" as a whole into a unary operator that accepts the condition expression as input and outputs the full ternary expression:Parser<Expr> ternary(Parser<Expr> expr) { return expr.postfix( Parsers.sequence( terms.token("?"), expr, terms.token(":"), expr, (unused, then, unused, orelse) -> cond -> new TernaryExpr(cond, then, orelse))); }OperatorTablealso handles left recursion transparently.p.postfix(op)is equivalent top op*in EBNF. -
infixn
AParserthat parses non-associative infix operator. Runsthisfor the left operand, and then runsopandthisfor the operator and the right operand optionally. TheBiFunctionobjects returned fromopare applied to the return values of the two operands, if any.p.infixn(op)is equivalent top (op p)?in EBNF. -
infixl
public final Parser<T> infixl(Parser<? extends BiFunction<? super T, ? super T, ? extends T>> operator) AParserfor left-associative infix operator. Runsthisfor the left operand, and then runsoperatorandthisfor the operator and the right operand for 0 or more times greedily. TheBiFunctionobjects returned fromoperatorare applied from left to right to the return values ofthis, if any. For example:a + b + c + dis evaluated as(((a + b)+c)+d).p.infixl(op)is equivalent top (op p)*in EBNF. -
infixr
AParserfor right-associative infix operator. Runsthisfor the left operand, and then runsopandthisfor the operator and the right operand for 0 or more times greedily. TheBiFunctionobjects returned fromopare applied from right to left to the return values ofthis, if any. For example:a + b + c + dis evaluated asa + (b + (c + d)).p.infixr(op)is equivalent top (op p)*in EBNF. -
token
AParserthat runsthisand wraps the return value in aToken.It is normally not necessary to call this method explicitly.
lexer(Parser)andfrom(Parser, Parser)both do the conversion automatically. -
source
AParserthat returns the matched string in the original source. -
withSource
AParserthat returns both parsed object and matched string. -
from
AParserthat takes as input theTokencollection returned bylexer, and runsthisto parse the tokens. Most parsers should use the simplerfrom(Parser, Parser)instead.thismust be a token level parser. -
from
AParserthat takes as input the tokens returned bytokenizerdelimited bydelim, and runsthisto parse the tokens. A common misunderstanding is thattokenizerhas to be a parser ofToken. It doesn't need to be becauseTerminalsalready takes care of wrapping your logical token objects into physicalTokenwith correct source location information tacked on for free. Your token object can literally be anything, as long as your token level parser can recognize it later.The following example uses
Terminals.tokenizer():Terminals terminals = ...; return parser.from(terminals.tokenizer(), Scanners.WHITESPACES.optional()).parse(str);
And tokens are optionally delimited by whitespaces.Optionally, you can skip comments using an alternative scanner than
WHITESPACES:Terminals terminals = ...; Parser<?> delim = Parsers.or( Scanners.WHITESPACE, Scanners.JAVA_LINE_COMMENT, Scanners.JAVA_BLOCK_COMMENT).skipMany(); return parser.from(terminals.tokenizer(), delim).parse(str);In both examples, it's important to make sure the delimiter scanner can accept empty string (either through
optional()orskipMany()), unless adjacent operator characters shouldn't be parsed as separate operators. i.e. "((" as two left parenthesis operators.thismust be a token level parser. -
lexer
AParserthat greedily runsthisrepeatedly, and ignores the pattern recognized bydelimbefore and after each occurrence. The result tokens are wrapped inTokenand are collected and returned in aList.It is normally not necessary to call this method explicitly.
from(Parser, Parser)is more convenient for simple uses that just need to connect a token level parser with a lexer that produces the tokens. When more flexible control over the token list is needed, for example, to parse indentation sensitive language, a pre-processor of the token list may be needed.thismust be a tokenizer that returns a token value. -
asDelimiter
As a delimiter, the parser's error is considered lenient and will only be reported if no other meaningful error is encountered. The delimiter's logical step is also considered 0, which means it won't ever stop repetition combinators such asmany(). -
parse
Parsessource. -
parse
Parses source read fromreadable.- Throws:
IOException
-
parse
Parsessourceunder the givenmode. For example:try { parser.parse(text, Mode.DEBUG); } catch (ParserException e) { ParseTree parseTree = e.getParseTree(); ... }- Since:
- 2.3
-
parseTree
Parsessourceand returns aParseTreecorresponding to the syntactical structure of the input. Onlylabeledparser nodes are represented in the parse tree.If parsing failed,
ParserException.getParseTree()can be inspected for the parse tree at error location.- Since:
- 2.3
-
parse
Deprecated.Please useparse(CharSequence)instead.Parsessource.- Parameters:
source- the source stringmoduleName- the name of the module, this name appears in error message- Returns:
- the result
-
parse
Deprecated.Please useparse(Readable)instead.Parses source read fromreadable.- Parameters:
readable- where the source is read frommoduleName- the name of the module, this name appears in error message- Returns:
- the result
- Throws:
IOException
-
apply
-
read
Copies all content fromfromtoto.- Throws:
IOException
-
getReturn
-
applyPrefixOperators
-
applyPostfixOperators
-
applyInfixOperators
-
applyInfixrOperators
-