Class AbstractCharInputReader
- java.lang.Object
-
- com.univocity.parsers.common.input.AbstractCharInputReader
-
- All Implemented Interfaces:
CharInput,CharInputReader
- Direct Known Subclasses:
ConcurrentCharInputReader,DefaultCharInputReader
public abstract class AbstractCharInputReader extends java.lang.Object implements CharInputReader
The base class for implementing different flavours ofCharInputReader.It provides the essential conversion of sequences of newline characters defined by
Format.getLineSeparator()into the normalized newline character provided inFormat.getNormalizedNewline().It also provides a default implementation for most of the methods specified by the
CharInputReaderinterface.Extending classes must essentially read characters from a given
Readerand assign it to the publicbufferwhen requested (in thereloadBuffer()method).- Author:
- Univocity Software Pty Ltd - parsers@univocity.com
- See Also:
Format,DefaultCharInputReader,ConcurrentCharInputReader
-
-
Field Summary
Fields Modifier and Type Field Description char[]bufferThe buffer itselfprotected booleancloseOnStopintiCurrent position in the bufferintlengthNumber of characters available in the buffer.
-
Constructor Summary
Constructors Constructor Description AbstractCharInputReader(char[] lineSeparator, char normalizedLineSeparator, int whitespaceRangeStart, boolean closeOnStop)Creates a new instance with the mandatory characters for handling newlines transparently.AbstractCharInputReader(char normalizedLineSeparator, int whitespaceRangeStart, boolean closeOnStop)Creates a new instance that attempts to detect the newlines used in the input automatically.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description voidaddInputAnalysisProcess(InputAnalysisProcess inputAnalysisProcess)Submits a customInputAnalysisProcessto analyze the input buffer and potentially discover configuration options such as column separators is CSV, data formats, etc.longcharCount()Returns the number of characters returned byCharInputReader.nextChar()at any given time.java.lang.StringcurrentParsedContent()Returns a String with the input character sequence parsed to produce the current record.intcurrentParsedContentLength()Returns the length of the character sequence parsed to produce the current record.voidenableNormalizeLineEndings(boolean normalizeLineEndings)Indicates to the input reader that the parser is running in "escape" mode and new lines should be returned as-is to prevent modifying the content of the parsed value.chargetChar()Returns the last character returned by theCharInputReader.nextChar()method.char[]getLineSeparator()Returns the line separator by this character input reader.java.lang.StringgetQuotedString(char quote, char escape, char escapeEscape, int maxLength, char stop1, char stop2, boolean keepQuotes, boolean keepEscape, boolean trimLeading, boolean trimTrailing)Attempts to collect a quotedStringfrom the current position until a closing quote or stop character is found on the input, or a line ending is reached.java.lang.StringgetString(char ch, char stop, boolean trim, java.lang.String nullValue, int maxLength)Attempts to collect aStringfrom the current position until a stop character is found on the input, or a line ending is reached.intlastIndexOf(char ch)Returns the last index of a given character in the current parsed contentlonglineCount()Returns the number of newlines read so far.voidmarkRecordStart()Marks the start of a new record in the input, used internally to calculate the result ofCharInputReader.currentParsedContent()charnextChar()Returns the next character in the input provided by the activeReader.java.lang.StringreadComment()Collects the comment line found on the input.protected abstract voidreloadBuffer()Informs the extending class that the buffer has been read entirely and requests for another batch of characters.protected abstract voidsetReader(java.io.Reader reader)Passes theReaderprovided in thestart(Reader)method to the extending class so it can begin loading characters from it.voidskipLines(long lines)Skips characters in the input until the given number of lines is discarded.booleanskipQuotedString(char quote, char escape, char stop1, char stop2)Attempts to skip a quotedStringfrom the current position until a stop character is found on the input, or a line ending is reached.booleanskipString(char ch, char stop)Attempts to skip aStringfrom the current position until a stop character is found on the input, or a line ending is reached.charskipWhitespace(char ch, char stopChar1, char stopChar2)Skips characters from the current input position, until a non-whitespace character, or a stop character is foundvoidstart(java.io.Reader reader)Initializes the CharInputReader implementation with aReaderwhich provides access to the input.protected voidunwrapInputStream(BomInput.BytesProcessedNotification notification)-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.univocity.parsers.common.input.CharInputReader
stop
-
-
-
-
Constructor Detail
-
AbstractCharInputReader
public AbstractCharInputReader(char normalizedLineSeparator, int whitespaceRangeStart, boolean closeOnStop)Creates a new instance that attempts to detect the newlines used in the input automatically.- Parameters:
normalizedLineSeparator- the normalized newline character (as defined inFormat.getNormalizedNewline()) that is used to replace any lineSeparator sequence found in the input.whitespaceRangeStart- starting range of characters considered to be whitespace.closeOnStop- indicates whether to automatically close the input whenCharInputReader.stop()is called
-
AbstractCharInputReader
public AbstractCharInputReader(char[] lineSeparator, char normalizedLineSeparator, int whitespaceRangeStart, boolean closeOnStop)Creates a new instance with the mandatory characters for handling newlines transparently.- Parameters:
lineSeparator- the sequence of characters that represent a newline, as defined inFormat.getLineSeparator()normalizedLineSeparator- the normalized newline character (as defined inFormat.getNormalizedNewline()) that is used to replace any lineSeparator sequence found in the input.whitespaceRangeStart- starting range of characters considered to be whitespace.closeOnStop- indicates whether to automatically close the input whenCharInputReader.stop()is called
-
-
Method Detail
-
setReader
protected abstract void setReader(java.io.Reader reader)
Passes theReaderprovided in thestart(Reader)method to the extending class so it can begin loading characters from it.- Parameters:
reader- theReaderprovided instart(Reader)
-
reloadBuffer
protected abstract void reloadBuffer()
Informs the extending class that the buffer has been read entirely and requests for another batch of characters. Implementors must assign the new character buffer to the publicbufferattribute, as well as the number of characters available to the publiclengthattribute. To notify the input does not have any more characters,lengthmust receive the -1 value
-
unwrapInputStream
protected final void unwrapInputStream(BomInput.BytesProcessedNotification notification)
-
start
public final void start(java.io.Reader reader)
Description copied from interface:CharInputReaderInitializes the CharInputReader implementation with aReaderwhich provides access to the input.- Specified by:
startin interfaceCharInputReader- Parameters:
reader- AReaderthat provides access to the input.
-
addInputAnalysisProcess
public final void addInputAnalysisProcess(InputAnalysisProcess inputAnalysisProcess)
Submits a customInputAnalysisProcessto analyze the input buffer and potentially discover configuration options such as column separators is CSV, data formats, etc. The process will be execute only once.- Parameters:
inputAnalysisProcess- a custom process to analyze the contents of the input buffer.
-
nextChar
public final char nextChar()
Description copied from interface:CharInputReaderReturns the next character in the input provided by the activeReader.If the input contains a sequence of newline characters (defined by
Format.getLineSeparator()), this method will automatically converted them to the newline character specified inFormat.getNormalizedNewline().A subsequent call to this method will return the character after the newline sequence.
- Specified by:
nextCharin interfaceCharInput- Specified by:
nextCharin interfaceCharInputReader- Returns:
- the next character in the input. '\0' if there are no more characters in the input or if the CharInputReader was stopped.
-
getChar
public final char getChar()
Description copied from interface:CharInputReaderReturns the last character returned by theCharInputReader.nextChar()method.- Specified by:
getCharin interfaceCharInput- Specified by:
getCharin interfaceCharInputReader- Returns:
- the last character returned by the
CharInputReader.nextChar()method.'\0' if there are no more characters in the input or if the CharInputReader was stopped.
-
lineCount
public final long lineCount()
Description copied from interface:CharInputReaderReturns the number of newlines read so far.- Specified by:
lineCountin interfaceCharInputReader- Returns:
- the number of newlines read so far.
-
skipLines
public final void skipLines(long lines)
Description copied from interface:CharInputReaderSkips characters in the input until the given number of lines is discarded.- Specified by:
skipLinesin interfaceCharInputReader- Parameters:
lines- the number of lines to skip from the current location in the input
-
readComment
public java.lang.String readComment()
Description copied from interface:CharInputReaderCollects the comment line found on the input.- Specified by:
readCommentin interfaceCharInputReader- Returns:
- the text found in the comment from the current position.
-
charCount
public final long charCount()
Description copied from interface:CharInputReaderReturns the number of characters returned byCharInputReader.nextChar()at any given time.- Specified by:
charCountin interfaceCharInputReader- Returns:
- the number of characters returned by
CharInputReader.nextChar()
-
enableNormalizeLineEndings
public final void enableNormalizeLineEndings(boolean normalizeLineEndings)
Description copied from interface:CharInputReaderIndicates to the input reader that the parser is running in "escape" mode and new lines should be returned as-is to prevent modifying the content of the parsed value.- Specified by:
enableNormalizeLineEndingsin interfaceCharInputReader- Parameters:
normalizeLineEndings- flag indicating that the parser is escaping values and line separators are to be returned as-is.
-
getLineSeparator
public char[] getLineSeparator()
Description copied from interface:CharInputReaderReturns the line separator by this character input reader. This could be the line separator defined in theFormat.getLineSeparator()configuration, or the line separator sequence identified automatically whenCommonParserSettings.isLineSeparatorDetectionEnabled()evaluates totrue.- Specified by:
getLineSeparatorin interfaceCharInputReader- Returns:
- the line separator in use.
-
skipWhitespace
public final char skipWhitespace(char ch, char stopChar1, char stopChar2)Description copied from interface:CharInputReaderSkips characters from the current input position, until a non-whitespace character, or a stop character is found- Specified by:
skipWhitespacein interfaceCharInputReader- Parameters:
ch- the current character of the inputstopChar1- the first stop character (which can be a whitespace)stopChar2- the second character (which can be a whitespace)- Returns:
- the first non-whitespace character (or delimiter) found in the input.
-
currentParsedContentLength
public final int currentParsedContentLength()
Description copied from interface:CharInputReaderReturns the length of the character sequence parsed to produce the current record.- Specified by:
currentParsedContentLengthin interfaceCharInputReader- Returns:
- the length of the text content parsed for the current input record
-
currentParsedContent
public final java.lang.String currentParsedContent()
Description copied from interface:CharInputReaderReturns a String with the input character sequence parsed to produce the current record.- Specified by:
currentParsedContentin interfaceCharInputReader- Returns:
- the text content parsed for the current input record.
-
lastIndexOf
public final int lastIndexOf(char ch)
Description copied from interface:CharInputReaderReturns the last index of a given character in the current parsed content- Specified by:
lastIndexOfin interfaceCharInputReader- Parameters:
ch- the character to look for- Returns:
- the last position of the given character in the current parsed content, or
-1if not found.
-
markRecordStart
public final void markRecordStart()
Description copied from interface:CharInputReaderMarks the start of a new record in the input, used internally to calculate the result ofCharInputReader.currentParsedContent()- Specified by:
markRecordStartin interfaceCharInputReader
-
skipString
public final boolean skipString(char ch, char stop)Description copied from interface:CharInputReaderAttempts to skip aStringfrom the current position until a stop character is found on the input, or a line ending is reached. If theStringcan be skipped, the current position of the parser will be updated to the last consumed character. If the internal buffer needs to be reloaded, this method will returnfalseand the current position of the buffer will remain unchanged.- Specified by:
skipStringin interfaceCharInputReader- Parameters:
ch- the current character to be considered. If equal to the stop characterfalsewill be returnedstop- the stop character that identifies the end of the content to be collected- Returns:
trueif an entireStringvalue was found on the input and skipped, orfalseif the buffer needs to reloaded.
-
getString
public final java.lang.String getString(char ch, char stop, boolean trim, java.lang.String nullValue, int maxLength)Description copied from interface:CharInputReaderAttempts to collect aStringfrom the current position until a stop character is found on the input, or a line ending is reached. If theStringcan be obtained, the current position of the parser will be updated to the last consumed character. If the internal buffer needs to be reloaded, this method will returnnulland the current position of the buffer will remain unchanged.- Specified by:
getStringin interfaceCharInputReader- Parameters:
ch- the current character to be considered. If equal to the stop character thenullValuewill be returnedstop- the stop character that identifies the end of the content to be collectedtrim- flag indicating whether or not trailing whitespaces should be discardednullValue- value to return when the length of the content to be returned is0.maxLength- the maximum length of theStringto be returned. If the length exceeds this limit,nullwill be returned- Returns:
- the
Stringfound on the input, ornullif the buffer needs to reloaded or the maximum length has been exceeded.
-
getQuotedString
public final java.lang.String getQuotedString(char quote, char escape, char escapeEscape, int maxLength, char stop1, char stop2, boolean keepQuotes, boolean keepEscape, boolean trimLeading, boolean trimTrailing)Description copied from interface:CharInputReaderAttempts to collect a quotedStringfrom the current position until a closing quote or stop character is found on the input, or a line ending is reached. If theStringcan be obtained, the current position of the parser will be updated to the last consumed character. If the internal buffer needs to be reloaded, this method will returnnulland the current position of the buffer will remain unchanged.- Specified by:
getQuotedStringin interfaceCharInputReader- Parameters:
quote- the quote characterescape- the quote escape characterescapeEscape- the escape of the quote escape charactermaxLength- the maximum length of theStringto be returned. If the length exceeds this limit,nullwill be returnedstop1- the first stop character that identifies the end of the content to be collectedstop2- the second stop character that identifies the end of the content to be collectedkeepQuotes- flag to indicate the quotes that wrap the resultingStringshould be kept.keepEscape- flag to indicate that escape sequences should be kepttrimLeading- flag to indicate leading whitespaces should be trimmedtrimTrailing- flag to indicate that trailing whitespaces should be trimmed- Returns:
- the
Stringfound on the input, ornullif the buffer needs to reloaded or the maximum length has been exceeded.
-
skipQuotedString
public final boolean skipQuotedString(char quote, char escape, char stop1, char stop2)Description copied from interface:CharInputReaderAttempts to skip a quotedStringfrom the current position until a stop character is found on the input, or a line ending is reached. If theStringcan be skipped, the current position of the parser will be updated to the last consumed character. If the internal buffer needs to be reloaded, this method will returnfalseand the current position of the buffer will remain unchanged.- Specified by:
skipQuotedStringin interfaceCharInputReader- Parameters:
quote- the quote characterescape- the quote escape characterstop1- the first stop character that identifies the end of the content to be collectedstop2- the second stop character that identifies the end of the content to be collected- Returns:
trueif an entireStringvalue was found on the input and skipped, orfalseif the buffer needs to reloaded.
-
-