Class SimpleTextParser

java.lang.Object
org.apache.commons.geometry.io.core.internal.SimpleTextParser

public class SimpleTextParser extends Object
Class providing basic text parsing capabilities. The goals of this class are to (1) provide a simple, flexible API for performing common text parsing operations and (2) provide a mechanism for creating consistent and informative parsing errors. This class is not intended as a replacement for grammar-based parsers and/or lexers.
  • Constructor Details

    • SimpleTextParser

      public SimpleTextParser(Reader reader)
      Construct a new instance that reads characters from the given reader. The reader will not be closed.
      Parameters:
      reader - reader instance to read characters from
    • SimpleTextParser

      Construct a new instance that reads characters from the given character buffer.
      Parameters:
      buffer - read buffer to read characters from
  • Method Details

    • getLineNumber

      public int getLineNumber()
      Get the current line number. Line numbers start at 1.
      Returns:
      the current line number
    • setLineNumber

      public void setLineNumber(int lineNumber)
      Set the current line number. This does not affect the character stream position, only the value returned by getLineNumber().
      Parameters:
      lineNumber - line number to set; line numbers start at 1
    • getColumnNumber

      public int getColumnNumber()
      Get the current column number. This indicates the column position of the character that will returned by the next call to readChar(). The first character of each line has a column number of 1.
      Returns:
      the current column number; column numbers start at 1
    • setColumnNumber

      public void setColumnNumber(int column)
      Set the current column number. This does not affect the character stream position, only the value returned by getColumnNumber().
      Parameters:
      column - the column number to set; column numbers start at 1
    • getMaxStringLength

      public int getMaxStringLength()
      Get the maximum length for strings returned by this instance. Operations that produce strings longer than this length will throw an exception.
      Returns:
      maximum length for strings returned by this instance
    • setMaxStringLength

      public void setMaxStringLength(int maxStringLength)
      Set the maximum length for strings returned by this instance. Operations that produce strings longer than this length will throw an exception.
      Parameters:
      maxStringLength - maximum length for strings returned by this instance
      Throws:
      IllegalArgumentException - if the argument is less than zero
    • getCurrentToken

      Get the current token. This is the most recent string read by one of the nextXXX() methods. This value will be null if no token has yet been read or if the end of content has been reached.
      Returns:
      the current token
      See Also:
    • hasNonEmptyToken

      public boolean hasNonEmptyToken()
      Return true if the current token is not null or empty.
      Returns:
      true if the current token is not null or empty
      See Also:
    • getCurrentTokenLineNumber

      Get the line number that the current token started on. This value will be -1 if no token has been read yet.
      Returns:
      current token starting line number or -1 if no token has been read yet
      See Also:
    • getCurrentTokenColumnNumber

      Get the column position that the current token started on. This value will be -1 if no token has been read yet.
      Returns:
      current token column number or -1 if no oken has been read yet
      See Also:
    • getCurrentTokenAsInt

      public int getCurrentTokenAsInt()
      Get the current token parsed as an integer.
      Returns:
      the current token parsed as an integer
      Throws:
      IllegalStateException - if no token has been read or the current token cannot be parsed as an integer
    • getCurrentTokenAsDouble

      public double getCurrentTokenAsDouble()
      Get the current token parsed as a double.
      Returns:
      the current token parsed as a double
      Throws:
      IllegalStateException - if no token has been read or the current token cannot be parsed as a double
    • hasMoreCharacters

      public boolean hasMoreCharacters()
      Return true if there are more characters to read from this instance.
      Returns:
      true if there are more characters to read from this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • hasMoreCharactersOnLine

      public boolean hasMoreCharactersOnLine()
      Return true if there are more characters to read on the current line.
      Returns:
      true if there are more characters to read on the current line
      Throws:
      UncheckedIOException - if an I/O error occurs
    • readChar

      public int readChar()
      Read and return the next character in the stream and advance the parser position. This method updates the current line number and column number but does not set the current token.
      Returns:
      the next character in the stream or -1 if the end of the stream has been reached
      Throws:
      UncheckedIOException - if an I/O error occurs
      See Also:
    • next

      public SimpleTextParser next(int len)
      Read a string containing at most len characters from the stream and set it as the current token. Characters are added to the string until the string has the specified length or the end of the stream is reached. The characters are consumed from the stream. The token is set to null if no more characters are available from the character stream when this method is called.
      Parameters:
      len - the maximum length of the extracted string
      Returns:
      this instance
      Throws:
      IllegalArgumentException - if len is less than 0 or greater than the configured maximum string length
      UncheckedIOException - if an I/O error occurs
      See Also:
    • nextWithLineContinuation

      public SimpleTextParser nextWithLineContinuation(char lineContinuationChar, int len)
      Read a string containing at most len characters from the stream and set it as the current token. This is similar to next(int) but with the exception that new line sequences beginning with lineContinuationChar are skipped.
      Parameters:
      lineContinuationChar - character used to indicate skipped new line sequences
      len - the maximum length of the extracted string
      Returns:
      this instance
      Throws:
      IllegalArgumentException - if len is less than 0 or greater than the configured maximum string length
      UncheckedIOException - if an I/O error occurs
      See Also:
    • next

      Read characters from the stream while the given predicate returns true and set the result as the current token. The next call to readChar() will return either a character that fails the predicate test or -1 if the end of the stream has been reached. The token will be null if the end of the stream has been reached prior to the method call.
      Parameters:
      pred - predicate function passed characters read from the input; reading continues until the predicate returns false
      Returns:
      this instance
      Throws:
      IllegalStateException - if the length of the produced string exceeds the configured maximum string length
      UncheckedIOException - if an I/O error occurs
      See Also:
    • nextWithLineContinuation

      public SimpleTextParser nextWithLineContinuation(char lineContinuationChar, IntPredicate pred)
      Read characters from the stream while the given predicate returns true and set the result as the current token. This is similar to next(IntPredicate) but with the exception that new line sequences prefixed with lineContinuationChar are skipped.
      Parameters:
      lineContinuationChar - character used to indicate skipped new line sequences
      pred - predicate function passed characters read from the input; reading continues until the predicate returns false
      Returns:
      this instance
      Throws:
      IllegalStateException - if the length of the produced string exceeds the configured maximum string length
      UncheckedIOException - if an I/O error occurs
      See Also:
    • nextLine

      Read characters from the current parser position to the next new line sequence and set the result as the current token . The newline character sequence ('\r', '\n', or '\r\n') at the end of the line is consumed but is not included in the token. The token will be null if the end of the stream has been reached prior to the method call.
      Returns:
      this instance
      Throws:
      IllegalStateException - if the length of the produced string exceeds the configured maximum string length
      UncheckedIOException - if an I/O error occurs
      See Also:
    • nextAlphanumeric

      Read a sequence of alphanumeric characters starting from the current parser position and set the result as the current token. The token will be the empty string if the next character in the stream is not alphanumeric and will be null if the end of the stream has been reached prior to the method call.
      Returns:
      this instance
      Throws:
      IllegalStateException - if the length of the produced string exceeds the configured maximum string length
      UncheckedIOException - if an I/O error occurs
      See Also:
    • discard

      public SimpleTextParser discard(int len)
      Discard len number of characters from the character stream. The parser position is updated but the current token is not changed.
      Parameters:
      len - number of characters to discard
      Returns:
      this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • discardWithLineContinuation

      public SimpleTextParser discardWithLineContinuation(char lineContinuationChar, int len)
      Discard len number of characters from the character stream. The parser position is updated but the current token is not changed. Lines beginning with lineContinuationChar are skipped.
      Parameters:
      lineContinuationChar - character used to indicate skipped new line sequences
      len - number of characters to discard
      Returns:
      this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • discard

      Discard characters from the stream while the given predicate returns true. The next call to readChar() will return either a character that fails the predicate test or -1 if the end of the stream has been reached. The parser position is updated but the current token is not changed.
      Parameters:
      pred - predicate test for characters to discard
      Returns:
      this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • discardWithLineContinuation

      public SimpleTextParser discardWithLineContinuation(char lineContinuationChar, IntPredicate pred)
      Discard characters from the stream while the given predicate returns true. New line sequences beginning with lineContinuationChar are skipped. The next call o readChar() will return either a character that fails the predicate test or -1 if the end of the stream has been reached. The parser position is updated but the current token is not changed.
      Parameters:
      lineContinuationChar - character used to indicate skipped new line sequences
      pred - predicate test for characters to discard
      Returns:
      this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • discardWhitespace

      Discard a sequence of whitespace characters from the character stream starting from the current parser position. The next call to readChar() will return either a non-whitespace character or -1 if the end of the stream has been reached. The parser position is updated but the current token is not changed.
      Returns:
      this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • discardLineWhitespace

      Discard the next whitespace characters on the current line. The next call to readChar() will return either a non-whitespace character on the current line, the newline character sequence (indicating the end of the line), or -1 (indicating the end of the stream). The parser position is updated but the current token is not changed.
      Returns:
      this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • discardNewLineSequence

      Discard the newline character sequence at the current reader position. The sequence is defined as one of "\r", "\n", or "\r\n". Does nothing if the reader is not positioned at a newline sequence. The parser position is updated but the current token is not changed.
      Returns:
      this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • discardLine

      Discard all remaining characters on the current line, including the terminating newline character sequence. The next call to readChar() will return either the first character on the next line or -1 if the end of the stream has been reached. The parser position is updated but the current token is not changed.
      Returns:
      this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • consume

      Consume characters from the stream and pass them to consumer while the given predicate returns true. The operation ends when the predicate returns false or the end of the stream is reached.
      Parameters:
      pred - predicate test for characters to consume
      consumer - object to be passed each consumed character
      Returns:
      this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • consumeWithLineContinuation

      public SimpleTextParser consumeWithLineContinuation(char lineContinuationChar, int len, IntConsumer consumer)
      Consume at most len characters from the stream, passing each to the given consumer. This method is similar to consume(int, IntConsumer) with the exception that new line sequences prefixed with lineContinuationChar are skipped.
      Parameters:
      lineContinuationChar - character used to indicate skipped new line sequences
      len - number of characters to consume
      consumer - function to be passed each consumed character
      Returns:
      this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • consume

      public SimpleTextParser consume(int len, IntConsumer consumer)
      Consume at most len characters from the stream, passing each to the given consumer. The operation continues until len number of characters have been read or the end of the stream has been reached.
      Parameters:
      len - number of characters to consume
      consumer - object to be passed each consumed character
      Returns:
      this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • consumeWithLineContinuation

      public SimpleTextParser consumeWithLineContinuation(char lineContinuationChar, IntPredicate pred, IntConsumer consumer)
      Consume characters from the stream and pass them to consumer while the given predicate returns true. This method is similar to consume(IntPredicate, IntConsumer) with the exception that new lines sequences beginning with lineContinuationChar are skipped.
      Parameters:
      lineContinuationChar - character used to indicate skipped new line sequences
      pred - predicate test for characters to consume
      consumer - object to be passed each consumed character
      Returns:
      this instance
      Throws:
      UncheckedIOException - if an I/O error occurs
    • peekChar

      public int peekChar()
      Return the next character in the stream but do not advance the parser position.
      Returns:
      the next character in the stream or -1 if the end of the stream has been reached
      Throws:
      UncheckedIOException - if an I/O error occurs
      See Also:
    • peek

      public String peek(int len)
      Return a string containing containing at most len characters from the stream but without changing the parser position. Characters are added to the string until the string has the specified length or the end of the stream is reached.
      Parameters:
      len - the maximum length of the returned string
      Returns:
      a string containing containing at most len characters from the stream or null if the parser has already reached the end of the stream
      Throws:
      IllegalArgumentException - if len is less than 0 or greater than the configured maximum string length
      UncheckedIOException - if an I/O error occurs
      See Also:
    • peek

      public String peek(IntPredicate pred)
      Read characters from the stream while the given predicate returns true but do not change the current token or advance the parser position.
      Parameters:
      pred - predicate function passed characters read from the input; reading continues until the predicate returns false
      Returns:
      string containing characters matching pred or null if the parser has already reached the end of the stream
      Throws:
      IllegalStateException - if the length of the produced string exceeds the configured maximum string length
      UncheckedIOException - if an I/O error occurs
      See Also:
    • match

      public SimpleTextParser match(String expected)
      Compare the current token with the argument and throw an exception if they are not equal. The comparison is case-sensitive.
      Parameters:
      expected - expected token
      Returns:
      this instance
      Throws:
      IllegalStateException - if no token has been read or expected does not exactly equal the current token
    • matchIgnoreCase

      Compare the current token with the argument and throw an exception if they are not equal. The comparison is not case-sensitive.
      Parameters:
      expected - expected token
      Returns:
      this instance
      Throws:
      IllegalStateException - if no token has been read or expected does not equal the current token (ignoring case)
    • tryMatch

      public boolean tryMatch(String expected)
      Return true if the current token is equal to the argument. The comparison is case-sensitive.
      Parameters:
      expected - expected token
      Returns:
      true if the argument exactly equals the current token
      Throws:
      IllegalStateException - if no token has been read
      UncheckedIOException - if an I/O error occurs
    • tryMatchIgnoreCase

      public boolean tryMatchIgnoreCase(String expected)
      Return true if the current token is equal to the argument. The comparison is not case-sensitive.
      Parameters:
      expected - expected token
      Returns:
      true if the argument equals the current token (ignoring case)
      Throws:
      IllegalStateException - if no token has been read
    • choose

      public int choose(String... expected)
      Return the index of the argument that exactly matches the current token. An exception is thrown if no match is found. String comparisons are case-sensitive.
      Parameters:
      expected - strings to compare with the current token
      Returns:
      index of the argument that exactly matches the current token
      Throws:
      IllegalStateException - if no token has been read or no match is found among the arguments
    • choose

      public int choose(List<String> expected)
      Return the index of the argument that exactly matches the current token. An exception is thrown if no match is found. String comparisons are case-sensitive.
      Parameters:
      expected - strings to compare with the current token
      Returns:
      index of the argument that exactly matches the current token
      Throws:
      IllegalStateException - if no token has been read or no match is found among the arguments
    • chooseIgnoreCase

      public int chooseIgnoreCase(String... expected)
      Return the index of the argument that matches the current token, ignoring case. An exception is thrown if no match is found. String comparisons are not case-sensitive.
      Parameters:
      expected - strings to compare with the current token
      Returns:
      index of the argument that matches the current token (ignoring case)
      Throws:
      IllegalStateException - if no token has been read or no match is found among the arguments
    • chooseIgnoreCase

      public int chooseIgnoreCase(List<String> expected)
      Return the index of the argument that matches the current token, ignoring case. An exception is thrown if no match is found. String comparisons are not case-sensitive.
      Parameters:
      expected - strings to compare with the current token
      Returns:
      index of the argument that matches the current token (ignoring case)
      Throws:
      IllegalStateException - if no token has been read or no match is found among the arguments
    • tryChoose

      public int tryChoose(String... expected)
      Return the index of the argument that exactly matches the current token or -1 if no match is found. String comparisons are case-sensitive.
      Parameters:
      expected - strings to compare with the current token
      Returns:
      index of the argument that exactly matches the current token or -1 if no match is found
      Throws:
      IllegalStateException - if no token has been read
    • tryChoose

      public int tryChoose(List<String> expected)
      Return the index of the argument that exactly matches the current token or -1 if no match is found. String comparisons are case-sensitive.
      Parameters:
      expected - strings to compare with the current token
      Returns:
      index of the argument that exactly matches the current token or -1 if no match is found
      Throws:
      IllegalStateException - if no token has been read
    • tryChooseIgnoreCase

      public int tryChooseIgnoreCase(String... expected)
      Return the index of the argument that matches the current token or -1 if no match is found. String comparisons are not case-sensitive.
      Parameters:
      expected - strings to compare with the current token
      Returns:
      index of the argument that matches the current token (ignoring case) or -1 if no match is found
      Throws:
      IllegalStateException - if no token has been read
    • tryChooseIgnoreCase

      public int tryChooseIgnoreCase(List<String> expected)
      Return the index of the argument that matches the current token or -1 if no match is found. String comparisons are not case-sensitive.
      Parameters:
      expected - strings to compare with the current token
      Returns:
      index of the argument that matches the current token (ignoring case) or -1 if no match is found
      Throws:
      IllegalStateException - if no token has been read
    • unexpectedToken

      Get an exception indicating that the current token was unexpected. The returned exception contains a message with the line number and column of the current token and a description of its value.
      Parameters:
      expected - string describing what was expected
      Returns:
      exception indicating that the current token was unexpected
    • unexpectedToken

      Get an exception indicating that the current token was unexpected. The returned exception contains a message with the line number and column of the current token and a description of its value.
      Parameters:
      expected - string describing what was expected
      cause - cause of the error
      Returns:
      exception indicating that the current token was unexpected
    • tokenError

      Get an exception indicating an error during parsing at the current token position.
      Parameters:
      msg - error message
      Returns:
      an exception indicating an error during parsing at the current token position
    • tokenError

      Get an exception indicating an error during parsing at the current token position.
      Parameters:
      msg - error message
      cause - the cause of the error; may be null
      Returns:
      an exception indicating an error during parsing at the current token position
    • parseError

      Return an exception indicating an error occurring at the current parser position.
      Parameters:
      msg - error message
      Returns:
      an exception indicating an error during parsing
    • parseError

      Return an exception indicating an error occurring at the current parser position.
      Parameters:
      msg - error message
      cause - the cause of the error; may be null
      Returns:
      an exception indicating an error during parsing
    • parseError

      public IllegalStateException parseError(int line, int col, String msg)
      Return an exception indicating an error during parsing.
      Parameters:
      line - line number of the error
      col - column number of the error
      msg - error message
      Returns:
      an exception indicating an error during parsing
    • parseError

      public IllegalStateException parseError(int line, int col, String msg, Throwable cause)
      Return an exception indicating an error during parsing.
      Parameters:
      line - line number of the error
      col - column number of the error
      msg - error message
      cause - the cause of the error
      Returns:
      an exception indicating an error during parsing
    • isWhitespace

      public static boolean isWhitespace(int ch)
      Return true if the given character (Unicode code point) is whitespace.
      Parameters:
      ch - character (Unicode code point) to test
      Returns:
      true if the given character is whitespace
      See Also:
    • isNotWhitespace

      public static boolean isNotWhitespace(int ch)
      Return true if the given character (Unicode code point) is not whitespace.
      Parameters:
      ch - character (Unicode code point) to test
      Returns:
      true if the given character is not whitespace
      See Also:
    • isLineWhitespace

      public static boolean isLineWhitespace(int ch)
      Return true if the given character (Unicode code point) is whitespace that is not used in newline sequences (ie, not '\r' or '\n').
      Parameters:
      ch - character (Unicode code point) to test
      Returns:
      true if the given character is a whitespace character not used in newline sequences
    • isNewLinePart

      public static boolean isNewLinePart(int ch)
      Return true if the given character (Unicode code point) is used as part of newline sequences (ie, is either '\r' or '\n').
      Parameters:
      ch - character (Unicode code point) to test
      Returns:
      true if the given character is used as part of newline sequences
    • isNotNewLinePart

      public static boolean isNotNewLinePart(int ch)
      Return true if the given character (Unicode code point) is not used as part of newline sequences (ie, not '\r' or '\n').
      Parameters:
      ch - character (Unicode code point) to test
      Returns:
      true if the given character is not used as part of newline sequences
      See Also:
    • isAlphanumeric

      public static boolean isAlphanumeric(int ch)
      Return true if the given character (Unicode code point) is alphanumeric.
      Parameters:
      ch - character (Unicode code point) to test
      Returns:
      true if the argument is alphanumeric
      See Also:
    • isNotAlphanumeric

      public static boolean isNotAlphanumeric(int ch)
      Return true if the given character (Unicode code point) is not alphanumeric.
      Parameters:
      ch - character (Unicode code point) to test
      Returns:
      true if the argument is not alphanumeric
      See Also:
    • isIntegerPart

      public static boolean isIntegerPart(int ch)
      Return true if the given character (Unicode code point) can be used as part of the string representation of an integer. This will be true for the following types of characters:
      • digits
      • the '-' (minus) character
      • the '+' (plus) character
      Parameters:
      ch - character (Unicode code point) to test
      Returns:
      true if the given character can be used as part of an integer string
    • isDecimalPart

      public static boolean isDecimalPart(int ch)
      Return true if the given character (Unicode code point) can be used as part of the string representation of a decimal number. This will be true for the following types of characters:
      • digits
      • the '-' (minus) character
      • the '+' (plus) character
      • the '.' (period) character
      • the 'e' character
      • the 'E' character
      Parameters:
      ch - character (Unicode code point) to test
      Returns:
      true if the given character can be used as part of a decimal number string