Class SimpleTextParser
java.lang.Object
org.apache.commons.geometry.io.core.internal.SimpleTextParser
Class providing basic text parsing capabilities. The goals of this class are to
(1) provide a simple, flexible API for performing common text parsing operations and
(2) provide a mechanism for creating consistent and informative parsing errors.
This class is not intended as a replacement for grammar-based parsers and/or lexers.
-
Constructor Summary
ConstructorsConstructorDescriptionSimpleTextParser(Reader reader) Construct a new instance that reads characters from the given reader.SimpleTextParser(CharReadBuffer buffer) Construct a new instance that reads characters from the given character buffer. -
Method Summary
Modifier and TypeMethodDescriptionintReturn the index of the argument that exactly matches thecurrent token.intReturn the index of the argument that exactly matches thecurrent token.intchooseIgnoreCase(String... expected) Return the index of the argument that matches thecurrent token, ignoring case.intchooseIgnoreCase(List<String> expected) Return the index of the argument that matches thecurrent token, ignoring case.consume(int len, IntConsumer consumer) Consume at mostlencharacters from the stream, passing each to the given consumer.consume(IntPredicate pred, IntConsumer consumer) Consume characters from the stream and pass them toconsumerwhile the given predicate returns true.consumeWithLineContinuation(char lineContinuationChar, int len, IntConsumer consumer) Consume at mostlencharacters from the stream, passing each to the given consumer.consumeWithLineContinuation(char lineContinuationChar, IntPredicate pred, IntConsumer consumer) Consume characters from the stream and pass them toconsumerwhile the given predicate returns true.discard(int len) Discardlennumber of characters from the character stream.discard(IntPredicate pred) Discard characters from the stream while the given predicate returns true.Discard all remaining characters on the current line, including the terminating newline character sequence.Discard the next whitespace characters on the current line.Discard the newline character sequence at the current reader position.Discard a sequence of whitespace characters from the character stream starting from the current parser position.discardWithLineContinuation(char lineContinuationChar, int len) Discardlennumber of characters from the character stream.discardWithLineContinuation(char lineContinuationChar, IntPredicate pred) Discard characters from the stream while the given predicate returns true.intGet the current column number.Get the current token.doubleGet the current token parsed as a double.intGet the current token parsed as an integer.intGet the column position that the current token started on.intGet the line number that the current token started on.intGet the current line number.intGet the maximum length for strings returned by this instance.booleanReturn true if there are more characters to read from this instance.booleanReturn true if there are more characters to read on the current line.booleanReturn true if the current token is not null or empty.static booleanisAlphanumeric(int ch) Return true if the given character (Unicode code point) is alphanumeric.static booleanisDecimalPart(int ch) Return true if the given character (Unicode code point) can be used as part of the string representation of a decimal number.static booleanisIntegerPart(int ch) Return true if the given character (Unicode code point) can be used as part of the string representation of an integer.static booleanisLineWhitespace(int ch) Return true if the given character (Unicode code point) is whitespace that is not used in newline sequences (ie, not '\r' or '\n').static booleanisNewLinePart(int ch) Return true if the given character (Unicode code point) is used as part of newline sequences (ie, is either '\r' or '\n').static booleanisNotAlphanumeric(int ch) Return true if the given character (Unicode code point) is not alphanumeric.static booleanisNotNewLinePart(int ch) Return true if the given character (Unicode code point) is not used as part of newline sequences (ie, not '\r' or '\n').static booleanisNotWhitespace(int ch) Return true if the given character (Unicode code point) is not whitespace.static booleanisWhitespace(int ch) Return true if the given character (Unicode code point) is whitespace.Compare thecurrent tokenwith the argument and throw an exception if they are not equal.matchIgnoreCase(String expected) Compare thecurrent tokenwith the argument and throw an exception if they are not equal.next(int len) Read a string containing at mostlencharacters from the stream and set it as the current token.next(IntPredicate pred) Read characters from the stream while the given predicate returns true and set the result as the current token.Read a sequence of alphanumeric characters starting from the current parser position and set the result as the current token.nextLine()Read characters from the current parser position to the next new line sequence and set the result as the current token .nextWithLineContinuation(char lineContinuationChar, int len) Read a string containing at mostlencharacters from the stream and set it as the current token.nextWithLineContinuation(char lineContinuationChar, IntPredicate pred) Read characters from the stream while the given predicate returns true and set the result as the current token.parseError(int line, int col, String msg) Return an exception indicating an error during parsing.parseError(int line, int col, String msg, Throwable cause) Return an exception indicating an error during parsing.parseError(String msg) Return an exception indicating an error occurring at the current parser position.parseError(String msg, Throwable cause) Return an exception indicating an error occurring at the current parser position.peek(int len) Return a string containing containing at mostlencharacters from the stream but without changing the parser position.peek(IntPredicate pred) Read characters from the stream while the given predicate returns true but do not change the current token or advance the parser position.intpeekChar()Return the next character in the stream but do not advance the parser position.intreadChar()Read and return the next character in the stream and advance the parser position.voidsetColumnNumber(int column) Set the current column number.voidsetLineNumber(int lineNumber) Set the current line number.voidsetMaxStringLength(int maxStringLength) Set the maximum length for strings returned by this instance.tokenError(String msg) Get an exception indicating an error during parsing at the current token position.tokenError(String msg, Throwable cause) Get an exception indicating an error during parsing at the current token position.intReturn the index of the argument that exactly matches thecurrent tokenor -1 if no match is found.intReturn the index of the argument that exactly matches thecurrent tokenor -1 if no match is found.inttryChooseIgnoreCase(String... expected) Return the index of the argument that matches thecurrent tokenor -1 if no match is found.inttryChooseIgnoreCase(List<String> expected) Return the index of the argument that matches thecurrent tokenor -1 if no match is found.booleanReturn true if thecurrent tokenis equal to the argument.booleantryMatchIgnoreCase(String expected) Return true if thecurrent tokenis equal to the argument.unexpectedToken(String expected) Get an exception indicating that the current token was unexpected.unexpectedToken(String expected, Throwable cause) Get an exception indicating that the current token was unexpected.
-
Constructor Details
-
SimpleTextParser
Construct a new instance that reads characters from the given reader. The reader will not be closed.- Parameters:
reader- reader instance to read characters from
-
SimpleTextParser
Construct a new instance that reads characters from the given character buffer.- Parameters:
buffer- read buffer to read characters from
-
-
Method Details
-
getLineNumber
Get the current line number. Line numbers start at 1.- Returns:
- the current line number
-
setLineNumber
Set the current line number. This does not affect the character stream position, only the value returned bygetLineNumber().- Parameters:
lineNumber- line number to set; line numbers start at 1
-
getColumnNumber
Get the current column number. This indicates the column position of the character that will returned by the next call toreadChar(). The first character of each line has a column number of 1.- Returns:
- the current column number; column numbers start at 1
-
setColumnNumber
Set the current column number. This does not affect the character stream position, only the value returned bygetColumnNumber().- Parameters:
column- the column number to set; column numbers start at 1
-
getMaxStringLength
Get the maximum length for strings returned by this instance. Operations that produce strings longer than this length will throw an exception.- Returns:
- maximum length for strings returned by this instance
-
setMaxStringLength
Set the maximum length for strings returned by this instance. Operations that produce strings longer than this length will throw an exception.- Parameters:
maxStringLength- maximum length for strings returned by this instance- Throws:
IllegalArgumentException- if the argument is less than zero
-
getCurrentToken
Get the current token. This is the most recent string read by one of thenextXXX()methods. This value will be null if no token has yet been read or if the end of content has been reached.- Returns:
- the current token
- See Also:
-
hasNonEmptyToken
Return true if the current token is not null or empty.- Returns:
- true if the current token is not null or empty
- See Also:
-
getCurrentTokenLineNumber
Get the line number that the current token started on. This value will be -1 if no token has been read yet.- Returns:
- current token starting line number or -1 if no token has been read yet
- See Also:
-
getCurrentTokenColumnNumber
Get the column position that the current token started on. This value will be -1 if no token has been read yet.- Returns:
- current token column number or -1 if no oken has been read yet
- See Also:
-
getCurrentTokenAsInt
Get the current token parsed as an integer.- Returns:
- the current token parsed as an integer
- Throws:
IllegalStateException- if no token has been read or the current token cannot be parsed as an integer
-
getCurrentTokenAsDouble
Get the current token parsed as a double.- Returns:
- the current token parsed as a double
- Throws:
IllegalStateException- if no token has been read or the current token cannot be parsed as a double
-
hasMoreCharacters
Return true if there are more characters to read from this instance.- Returns:
- true if there are more characters to read from this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
hasMoreCharactersOnLine
Return true if there are more characters to read on the current line.- Returns:
- true if there are more characters to read on the current line
- Throws:
UncheckedIOException- if an I/O error occurs
-
readChar
Read and return the next character in the stream and advance the parser position. This method updates the current line number and column number but does not set thecurrent token.- Returns:
- the next character in the stream or -1 if the end of the stream has been reached
- Throws:
UncheckedIOException- if an I/O error occurs- See Also:
-
next
Read a string containing at mostlencharacters from the stream and set it as the current token. Characters are added to the string until the string has the specified length or the end of the stream is reached. The characters are consumed from the stream. The token is set to null if no more characters are available from the character stream when this method is called.- Parameters:
len- the maximum length of the extracted string- Returns:
- this instance
- Throws:
IllegalArgumentException- iflenis less than 0 or greater than the configuredmaximum string lengthUncheckedIOException- if an I/O error occurs- See Also:
-
nextWithLineContinuation
Read a string containing at mostlencharacters from the stream and set it as the current token. This is similar tonext(int)but with the exception that new line sequences beginning withlineContinuationCharare skipped.- Parameters:
lineContinuationChar- character used to indicate skipped new line sequenceslen- the maximum length of the extracted string- Returns:
- this instance
- Throws:
IllegalArgumentException- iflenis less than 0 or greater than the configuredmaximum string lengthUncheckedIOException- if an I/O error occurs- See Also:
-
next
Read characters from the stream while the given predicate returns true and set the result as the current token. The next call toreadChar()will return either a character that fails the predicate test or -1 if the end of the stream has been reached. The token will be null if the end of the stream has been reached prior to the method call.- Parameters:
pred- predicate function passed characters read from the input; reading continues until the predicate returns false- Returns:
- this instance
- Throws:
IllegalStateException- if the length of the produced string exceeds the configuredmaximum string lengthUncheckedIOException- if an I/O error occurs- See Also:
-
nextWithLineContinuation
Read characters from the stream while the given predicate returns true and set the result as the current token. This is similar tonext(IntPredicate)but with the exception that new line sequences prefixed withlineContinuationCharare skipped.- Parameters:
lineContinuationChar- character used to indicate skipped new line sequencespred- predicate function passed characters read from the input; reading continues until the predicate returns false- Returns:
- this instance
- Throws:
IllegalStateException- if the length of the produced string exceeds the configuredmaximum string lengthUncheckedIOException- if an I/O error occurs- See Also:
-
nextLine
Read characters from the current parser position to the next new line sequence and set the result as the current token . The newline character sequence ('\r', '\n', or '\r\n') at the end of the line is consumed but is not included in the token. The token will be null if the end of the stream has been reached prior to the method call.- Returns:
- this instance
- Throws:
IllegalStateException- if the length of the produced string exceeds the configuredmaximum string lengthUncheckedIOException- if an I/O error occurs- See Also:
-
nextAlphanumeric
Read a sequence of alphanumeric characters starting from the current parser position and set the result as the current token. The token will be the empty string if the next character in the stream is not alphanumeric and will be null if the end of the stream has been reached prior to the method call.- Returns:
- this instance
- Throws:
IllegalStateException- if the length of the produced string exceeds the configuredmaximum string lengthUncheckedIOException- if an I/O error occurs- See Also:
-
discard
Discardlennumber of characters from the character stream. The parser position is updated but the current token is not changed.- Parameters:
len- number of characters to discard- Returns:
- this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
discardWithLineContinuation
Discardlennumber of characters from the character stream. The parser position is updated but the current token is not changed. Lines beginning withlineContinuationCharare skipped.- Parameters:
lineContinuationChar- character used to indicate skipped new line sequenceslen- number of characters to discard- Returns:
- this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
discard
Discard characters from the stream while the given predicate returns true. The next call toreadChar()will return either a character that fails the predicate test or -1 if the end of the stream has been reached. The parser position is updated but the current token is not changed.- Parameters:
pred- predicate test for characters to discard- Returns:
- this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
discardWithLineContinuation
Discard characters from the stream while the given predicate returns true. New line sequences beginning withlineContinuationCharare skipped. The next call oreadChar()will return either a character that fails the predicate test or -1 if the end of the stream has been reached. The parser position is updated but the current token is not changed.- Parameters:
lineContinuationChar- character used to indicate skipped new line sequencespred- predicate test for characters to discard- Returns:
- this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
discardWhitespace
Discard a sequence of whitespace characters from the character stream starting from the current parser position. The next call toreadChar()will return either a non-whitespace character or -1 if the end of the stream has been reached. The parser position is updated but the current token is not changed.- Returns:
- this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
discardLineWhitespace
Discard the next whitespace characters on the current line. The next call toreadChar()will return either a non-whitespace character on the current line, the newline character sequence (indicating the end of the line), or -1 (indicating the end of the stream). The parser position is updated but the current token is not changed.- Returns:
- this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
discardNewLineSequence
Discard the newline character sequence at the current reader position. The sequence is defined as one of "\r", "\n", or "\r\n". Does nothing if the reader is not positioned at a newline sequence. The parser position is updated but the current token is not changed.- Returns:
- this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
discardLine
Discard all remaining characters on the current line, including the terminating newline character sequence. The next call toreadChar()will return either the first character on the next line or -1 if the end of the stream has been reached. The parser position is updated but the current token is not changed.- Returns:
- this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
consume
Consume characters from the stream and pass them toconsumerwhile the given predicate returns true. The operation ends when the predicate returns false or the end of the stream is reached.- Parameters:
pred- predicate test for characters to consumeconsumer- object to be passed each consumed character- Returns:
- this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
consumeWithLineContinuation
public SimpleTextParser consumeWithLineContinuation(char lineContinuationChar, int len, IntConsumer consumer) Consume at mostlencharacters from the stream, passing each to the given consumer. This method is similar toconsume(int, IntConsumer)with the exception that new line sequences prefixed withlineContinuationCharare skipped.- Parameters:
lineContinuationChar- character used to indicate skipped new line sequenceslen- number of characters to consumeconsumer- function to be passed each consumed character- Returns:
- this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
consume
Consume at mostlencharacters from the stream, passing each to the given consumer. The operation continues untillennumber of characters have been read or the end of the stream has been reached.- Parameters:
len- number of characters to consumeconsumer- object to be passed each consumed character- Returns:
- this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
consumeWithLineContinuation
public SimpleTextParser consumeWithLineContinuation(char lineContinuationChar, IntPredicate pred, IntConsumer consumer) Consume characters from the stream and pass them toconsumerwhile the given predicate returns true. This method is similar toconsume(IntPredicate, IntConsumer)with the exception that new lines sequences beginning withlineContinuationCharare skipped.- Parameters:
lineContinuationChar- character used to indicate skipped new line sequencespred- predicate test for characters to consumeconsumer- object to be passed each consumed character- Returns:
- this instance
- Throws:
UncheckedIOException- if an I/O error occurs
-
peekChar
Return the next character in the stream but do not advance the parser position.- Returns:
- the next character in the stream or -1 if the end of the stream has been reached
- Throws:
UncheckedIOException- if an I/O error occurs- See Also:
-
peek
Return a string containing containing at mostlencharacters from the stream but without changing the parser position. Characters are added to the string until the string has the specified length or the end of the stream is reached.- Parameters:
len- the maximum length of the returned string- Returns:
- a string containing containing at most
lencharacters from the stream or null if the parser has already reached the end of the stream - Throws:
IllegalArgumentException- iflenis less than 0 or greater than the configuredmaximum string lengthUncheckedIOException- if an I/O error occurs- See Also:
-
peek
Read characters from the stream while the given predicate returns true but do not change the current token or advance the parser position.- Parameters:
pred- predicate function passed characters read from the input; reading continues until the predicate returns false- Returns:
- string containing characters matching
predor null if the parser has already reached the end of the stream - Throws:
IllegalStateException- if the length of the produced string exceeds the configuredmaximum string lengthUncheckedIOException- if an I/O error occurs- See Also:
-
match
Compare thecurrent tokenwith the argument and throw an exception if they are not equal. The comparison is case-sensitive.- Parameters:
expected- expected token- Returns:
- this instance
- Throws:
IllegalStateException- if no token has been read orexpecteddoes not exactly equal the current token
-
matchIgnoreCase
Compare thecurrent tokenwith the argument and throw an exception if they are not equal. The comparison is not case-sensitive.- Parameters:
expected- expected token- Returns:
- this instance
- Throws:
IllegalStateException- if no token has been read orexpecteddoes not equal the current token (ignoring case)
-
tryMatch
Return true if thecurrent tokenis equal to the argument. The comparison is case-sensitive.- Parameters:
expected- expected token- Returns:
- true if the argument exactly equals the current token
- Throws:
IllegalStateException- if no token has been readUncheckedIOException- if an I/O error occurs
-
tryMatchIgnoreCase
Return true if thecurrent tokenis equal to the argument. The comparison is not case-sensitive.- Parameters:
expected- expected token- Returns:
- true if the argument equals the current token (ignoring case)
- Throws:
IllegalStateException- if no token has been read
-
choose
Return the index of the argument that exactly matches thecurrent token. An exception is thrown if no match is found. String comparisons are case-sensitive.- Parameters:
expected- strings to compare with the current token- Returns:
- index of the argument that exactly matches the current token
- Throws:
IllegalStateException- if no token has been read or no match is found among the arguments
-
choose
Return the index of the argument that exactly matches thecurrent token. An exception is thrown if no match is found. String comparisons are case-sensitive.- Parameters:
expected- strings to compare with the current token- Returns:
- index of the argument that exactly matches the current token
- Throws:
IllegalStateException- if no token has been read or no match is found among the arguments
-
chooseIgnoreCase
Return the index of the argument that matches thecurrent token, ignoring case. An exception is thrown if no match is found. String comparisons are not case-sensitive.- Parameters:
expected- strings to compare with the current token- Returns:
- index of the argument that matches the current token (ignoring case)
- Throws:
IllegalStateException- if no token has been read or no match is found among the arguments
-
chooseIgnoreCase
Return the index of the argument that matches thecurrent token, ignoring case. An exception is thrown if no match is found. String comparisons are not case-sensitive.- Parameters:
expected- strings to compare with the current token- Returns:
- index of the argument that matches the current token (ignoring case)
- Throws:
IllegalStateException- if no token has been read or no match is found among the arguments
-
tryChoose
Return the index of the argument that exactly matches thecurrent tokenor -1 if no match is found. String comparisons are case-sensitive.- Parameters:
expected- strings to compare with the current token- Returns:
- index of the argument that exactly matches the current token or -1 if no match is found
- Throws:
IllegalStateException- if no token has been read
-
tryChoose
Return the index of the argument that exactly matches thecurrent tokenor -1 if no match is found. String comparisons are case-sensitive.- Parameters:
expected- strings to compare with the current token- Returns:
- index of the argument that exactly matches the current token or -1 if no match is found
- Throws:
IllegalStateException- if no token has been read
-
tryChooseIgnoreCase
Return the index of the argument that matches thecurrent tokenor -1 if no match is found. String comparisons are not case-sensitive.- Parameters:
expected- strings to compare with the current token- Returns:
- index of the argument that matches the current token (ignoring case) or -1 if no match is found
- Throws:
IllegalStateException- if no token has been read
-
tryChooseIgnoreCase
Return the index of the argument that matches thecurrent tokenor -1 if no match is found. String comparisons are not case-sensitive.- Parameters:
expected- strings to compare with the current token- Returns:
- index of the argument that matches the current token (ignoring case) or -1 if no match is found
- Throws:
IllegalStateException- if no token has been read
-
unexpectedToken
Get an exception indicating that the current token was unexpected. The returned exception contains a message with the line number and column of the current token and a description of its value.- Parameters:
expected- string describing what was expected- Returns:
- exception indicating that the current token was unexpected
-
unexpectedToken
Get an exception indicating that the current token was unexpected. The returned exception contains a message with the line number and column of the current token and a description of its value.- Parameters:
expected- string describing what was expectedcause- cause of the error- Returns:
- exception indicating that the current token was unexpected
-
tokenError
Get an exception indicating an error during parsing at the current token position.- Parameters:
msg- error message- Returns:
- an exception indicating an error during parsing at the current token position
-
tokenError
Get an exception indicating an error during parsing at the current token position.- Parameters:
msg- error messagecause- the cause of the error; may be null- Returns:
- an exception indicating an error during parsing at the current token position
-
parseError
Return an exception indicating an error occurring at the current parser position.- Parameters:
msg- error message- Returns:
- an exception indicating an error during parsing
-
parseError
Return an exception indicating an error occurring at the current parser position.- Parameters:
msg- error messagecause- the cause of the error; may be null- Returns:
- an exception indicating an error during parsing
-
parseError
Return an exception indicating an error during parsing.- Parameters:
line- line number of the errorcol- column number of the errormsg- error message- Returns:
- an exception indicating an error during parsing
-
parseError
Return an exception indicating an error during parsing.- Parameters:
line- line number of the errorcol- column number of the errormsg- error messagecause- the cause of the error- Returns:
- an exception indicating an error during parsing
-
isWhitespace
Return true if the given character (Unicode code point) is whitespace.- Parameters:
ch- character (Unicode code point) to test- Returns:
- true if the given character is whitespace
- See Also:
-
isNotWhitespace
Return true if the given character (Unicode code point) is not whitespace.- Parameters:
ch- character (Unicode code point) to test- Returns:
- true if the given character is not whitespace
- See Also:
-
isLineWhitespace
Return true if the given character (Unicode code point) is whitespace that is not used in newline sequences (ie, not '\r' or '\n').- Parameters:
ch- character (Unicode code point) to test- Returns:
- true if the given character is a whitespace character not used in newline sequences
-
isNewLinePart
Return true if the given character (Unicode code point) is used as part of newline sequences (ie, is either '\r' or '\n').- Parameters:
ch- character (Unicode code point) to test- Returns:
- true if the given character is used as part of newline sequences
-
isNotNewLinePart
Return true if the given character (Unicode code point) is not used as part of newline sequences (ie, not '\r' or '\n').- Parameters:
ch- character (Unicode code point) to test- Returns:
- true if the given character is not used as part of newline sequences
- See Also:
-
isAlphanumeric
Return true if the given character (Unicode code point) is alphanumeric.- Parameters:
ch- character (Unicode code point) to test- Returns:
- true if the argument is alphanumeric
- See Also:
-
isNotAlphanumeric
Return true if the given character (Unicode code point) is not alphanumeric.- Parameters:
ch- character (Unicode code point) to test- Returns:
- true if the argument is not alphanumeric
- See Also:
-
isIntegerPart
Return true if the given character (Unicode code point) can be used as part of the string representation of an integer. This will be true for the following types of characters:digits- the '-' (minus) character
- the '+' (plus) character
- Parameters:
ch- character (Unicode code point) to test- Returns:
- true if the given character can be used as part of an integer string
-
isDecimalPart
Return true if the given character (Unicode code point) can be used as part of the string representation of a decimal number. This will be true for the following types of characters:digits- the '-' (minus) character
- the '+' (plus) character
- the '.' (period) character
- the 'e' character
- the 'E' character
- Parameters:
ch- character (Unicode code point) to test- Returns:
- true if the given character can be used as part of a decimal number string
-