Package org.apache.commons.csv
Class Lexer
java.lang.Object
org.apache.commons.csv.Lexer
- All Implemented Interfaces:
Closeable,AutoCloseable
Lexical analyzer.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final charprivate static final Stringprivate final char[]private final char[]private static final charConstant char to use for disabling comments, escapes, and encapsulation.private final charprivate final char[]private Stringprivate final booleanprivate final booleanprivate booleanprivate final booleanprivate static final Stringprivate final charprivate final ExtendedBufferedReaderThe input streamprivate final boolean -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Closes resources.(package private) longReturns the current character position(package private) longReturns the current line number(package private) String(package private) booleanisClosed()(package private) booleanisCommentStart(int ch) (package private) booleanisDelimiter(int ch) Determine whether the next characters constitute a delimiter throughExtendedBufferedReader.lookAhead(char[]).(package private) booleanisEndOfFile(int ch) Tests if the given character indicates the end of the file.(package private) booleanisEscape(int ch) Tests if the given character is the escape character.(package private) booleanTests if the next characters constitute a escape delimiter throughExtendedBufferedReader.lookAhead(char[]).private booleanisMetaChar(int ch) (package private) booleanisQuoteChar(int ch) (package private) booleanisStartOfLine(int ch) Tests if the current character represents the start of a line: a CR, LF, or is at the start of the file.private char(package private) TokenReturns the next token.private TokenparseEncapsulatedToken(Token token) Parses an encapsulated token.private TokenparseSimpleToken(Token token, int ch) Parses a simple token.(package private) booleanreadEndOfLine(int ch) Greedily accepts \n, \r and \r\n This checker consumes silently the second control-character...(package private) intHandle an escape sequence.(package private) voidtrimTrailingSpaces(StringBuilder buffer)
-
Field Details
-
CR_STRING
-
LF_STRING
-
DISABLED
private static final char DISABLEDConstant char to use for disabling comments, escapes, and encapsulation. The value -2 is used because it won't be confused with an EOF signal (-1), and because the Unicode valueFFFEwould be encoded as two chars (using surrogates) and thus there should never be a collision with a real text char.- See Also:
-
delimiter
private final char[] delimiter -
delimiterBuf
private final char[] delimiterBuf -
escapeDelimiterBuf
private final char[] escapeDelimiterBuf -
escape
private final char escape -
quoteChar
private final char quoteChar -
commentStart
private final char commentStart -
ignoreSurroundingSpaces
private final boolean ignoreSurroundingSpaces -
ignoreEmptyLines
private final boolean ignoreEmptyLines -
lenientEof
private final boolean lenientEof -
trailingData
private final boolean trailingData -
reader
The input stream -
firstEol
-
isLastTokenDelimiter
private boolean isLastTokenDelimiter
-
-
Constructor Details
-
Lexer
Lexer(CSVFormat format, ExtendedBufferedReader reader)
-
-
Method Details
-
close
Closes resources.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException- If an I/O error occurs
-
getCharacterPosition
long getCharacterPosition()Returns the current character position- Returns:
- the current character position
-
getCurrentLineNumber
long getCurrentLineNumber()Returns the current line number- Returns:
- the current line number
-
getFirstEol
String getFirstEol() -
isClosed
boolean isClosed() -
isCommentStart
boolean isCommentStart(int ch) -
isDelimiter
Determine whether the next characters constitute a delimiter throughExtendedBufferedReader.lookAhead(char[]).- Parameters:
ch- the current character.- Returns:
- true if the next characters constitute a delimiter.
- Throws:
IOException- If an I/O error occurs.
-
isEndOfFile
boolean isEndOfFile(int ch) Tests if the given character indicates the end of the file.- Returns:
- true if the given character indicates the end of the file.
-
isEscape
boolean isEscape(int ch) Tests if the given character is the escape character.- Returns:
- true if the given character is the escape character.
-
isEscapeDelimiter
Tests if the next characters constitute a escape delimiter throughExtendedBufferedReader.lookAhead(char[]). For example, for delimiter "[|]" and escape '!', return true if the next characters constitute "![!|!]".- Returns:
- true if the next characters constitute an escape delimiter.
- Throws:
IOException- If an I/O error occurs.
-
isMetaChar
private boolean isMetaChar(int ch) -
isQuoteChar
boolean isQuoteChar(int ch) -
isStartOfLine
boolean isStartOfLine(int ch) Tests if the current character represents the start of a line: a CR, LF, or is at the start of the file.- Parameters:
ch- the character to check- Returns:
- true if the character is at the start of a line.
-
mapNullToDisabled
-
nextToken
Returns the next token.A token corresponds to a term, a record change or an end-of-file indicator.
- Parameters:
token- an existing Token object to reuse. The caller is responsible for initializing the Token.- Returns:
- the next token found.
- Throws:
IOException- on stream access error.
-
parseEncapsulatedToken
Parses an encapsulated token.Encapsulated tokens are surrounded by the given encapsulating string. The encapsulator itself might be included in the token using a doubling syntax (as "", '') or using escaping (as in \", \'). Whitespaces before and after an encapsulated token is ignored. The token is finished when one of the following conditions becomes true:
- An unescaped encapsulator has been reached and is followed by optional whitespace then:
- delimiter (TOKEN)
- end of line (EORECORD)
- end of stream has been reached (EOF)
- Parameters:
token- the current token- Returns:
- a valid token object
- Throws:
IOException- Thrown when in an invalid state: EOF before closing encapsulator or invalid character before delimiter or EOL.
-
parseSimpleToken
Parses a simple token.Simple tokens are tokens that are not surrounded by encapsulators. A simple token might contain escaped delimiters (as \, or \;). The token is finished when one of the following conditions becomes true:
- The end of line has been reached (EORECORD)
- The end of stream has been reached (EOF)
- An unescaped delimiter has been reached (TOKEN)
- Parameters:
token- the current tokench- the current character- Returns:
- the filled token
- Throws:
IOException- on stream access error
-
readEndOfLine
Greedily accepts \n, \r and \r\n This checker consumes silently the second control-character...- Returns:
- true if the given or next character is a line-terminator
- Throws:
IOException
-
readEscape
Handle an escape sequence. The current character must be the escape character. On return, the next character is available by callingExtendedBufferedReader.getLastChar()on the input stream.- Returns:
- the unescaped character (as an int) or
if char following the escape is invalid.
invalid reference
Constants#EOF - Throws:
IOException- if there is a problem reading the stream or the end of stream is detected: the escape character is not allowed at end of stream
-
trimTrailingSpaces
-