Package com.fasterxml.aalto.in
Class StreamScanner
java.lang.Object
com.fasterxml.aalto.in.XmlScanner
com.fasterxml.aalto.in.ByteBasedScanner
com.fasterxml.aalto.in.StreamScanner
- All Implemented Interfaces:
XmlConsts,NamespaceContext,XMLStreamConstants
- Direct Known Subclasses:
Utf8Scanner
Base class for various byte stream based scanners (generally one
for each type of encoding supported).
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final XmlCharTypesThis is a simple container object that is used to access the decoding tables for characters.protected InputStreamUnderlying InputStream to use for reading content.protected byte[]protected int[]This buffer is used for name parsing.protected final ByteBasedPNameTableFor now, symbol table contains prefixed names.Fields inherited from class com.fasterxml.aalto.in.ByteBasedScanner
_inputEnd, _inputPtr, _tmpChar, BYTE_a, BYTE_A, BYTE_AMP, BYTE_APOS, BYTE_C, BYTE_CR, BYTE_D, BYTE_EQ, BYTE_EXCL, BYTE_g, BYTE_GT, BYTE_HASH, BYTE_HYPHEN, BYTE_l, BYTE_LBRACKET, BYTE_LF, BYTE_LT, BYTE_m, BYTE_NULL, BYTE_o, BYTE_p, BYTE_P, BYTE_q, BYTE_QMARK, BYTE_QUOT, BYTE_RBRACKET, BYTE_s, BYTE_S, BYTE_SEMICOLON, BYTE_SLASH, BYTE_SPACE, BYTE_t, BYTE_T, BYTE_TAB, BYTE_u, BYTE_xFields inherited from class com.fasterxml.aalto.in.XmlScanner
_attrCollector, _attrCount, _cfgCoalescing, _cfgLazyParsing, _config, _currElem, _currNsCount, _currRow, _currToken, _defaultNs, _depth, _entityPending, _isEmptyTag, _lastNsContext, _lastNsDecl, _nameBuffer, _nsBindingCache, _nsBindingCount, _nsBindings, _nsBindMisses, _pastBytesOrChars, _publicId, _rowStartOffset, _startColumn, _startRawOffset, _startRow, _systemId, _textBuilder, _tokenIncomplete, _tokenName, _xml11, CDATA_STR, INT_0, INT_9, INT_a, INT_A, INT_AMP, INT_APOS, INT_COLON, INT_CR, INT_EQ, INT_EXCL, INT_f, INT_F, INT_GT, INT_HYPHEN, INT_LBRACKET, INT_LF, INT_LT, INT_NULL, INT_QMARK, INT_QUOTE, INT_RBRACKET, INT_SLASH, INT_SPACE, INT_TAB, INT_z, MAX_UNICODE_CHAR, TOKEN_EOIFields inherited from interface com.fasterxml.aalto.util.XmlConsts
CHAR_CR, CHAR_LF, CHAR_NULL, CHAR_SPACE, STAX_DEFAULT_OUTPUT_ENCODING, STAX_DEFAULT_OUTPUT_VERSION, XML_DECL_KW_ENCODING, XML_DECL_KW_STANDALONE, XML_DECL_KW_VERSION, XML_SA_NO, XML_SA_YES, XML_V_10, XML_V_10_STR, XML_V_11, XML_V_11_STR, XML_V_UNKNOWNFields inherited from interface javax.xml.stream.XMLStreamConstants
ATTRIBUTE, CDATA, CHARACTERS, COMMENT, DTD, END_DOCUMENT, END_ELEMENT, ENTITY_DECLARATION, ENTITY_REFERENCE, NAMESPACE, NOTATION_DECLARATION, PROCESSING_INSTRUCTION, SPACE, START_DOCUMENT, START_ELEMENT -
Constructor Summary
ConstructorsConstructorDescriptionStreamScanner(ReaderConfig cfg, InputStream in, byte[] buffer, int ptr, int last) -
Method Summary
Modifier and TypeMethodDescriptionprotected voidprotected intHelper method used to isolate things that need to be (re)set in cases whereprotected voidprotected final PNameaddPName(int hash, int[] quads, int qlen, int lastQuadBytes) protected final intcheckInTreeIndentation(int c) Note: consequtive white space is only considered indentation, if the following token seems like a tag (start/end).protected final intcheckPrologIndentation(int c) private final PNamefindPName(int onlyQuad, int lastByteCount) Method called to process a sequence of bytes that is likely to be a PName.private final PNamefindPName(int lastQuad, int[] quads, int qlen, int lastByteCount) Method called to process a sequence of bytes that is likely to be a PName.private final PNamefindPName(int firstQuad, int secondQuad, int lastByteCount) Method called to process a sequence of bytes that is likely to be a PName.private final PNamefindPName(int lastQuad, int lastByteCount, int firstQuad, int qlen, int[] quads) Method called to process a sequence of bytes that is likely to be a PName.protected final intprivate final intprivate final intprotected final intNote that this method is currently also shareable for all Ascii-based encodings, and at least between UTF-8 and ISO-Latin1.private final inthandleEndElementSlow(int size) protected abstract inthandleEntityInText(boolean inAttr) private final intMethod called after leading 'invalid input: '<'?' has been parsed; needs to parse target.private final inthandlePrologDeclStart(boolean isProlog) protected abstract inthandleStartElement(byte b) Parsing of start element requires parsing of the element name (and attribute names), and is thus encoding-specific.protected final booleanloadAndRetain(int nrOfChars) protected final booleanloadMore()protected final byteloadOne()protected final byteloadOne(int type) private final voidmatchAsciiKeyword(String keyw) protected final bytenextByte()protected final bytenextByte(int tt) final intnextFromProlog(boolean isProlog) final intprotected final PNameparsePName(byte b) This method can (for now?) be shared between all Ascii-based encodings, since it only does coarse validity checking -- real checks are done in different method.protected final PNameparsePNameLong(int q, int[] quads) protected PNameparsePNameMedium(int i2, int q1) protected final PNameparsePNameSlow(byte b) protected abstract StringparsePublicId(byte quoteChar) protected abstract StringparseSystemId(byte quoteChar) protected byteskipInternalWs(boolean reqd, String msg) Methods inherited from class com.fasterxml.aalto.in.ByteBasedScanner
addUTFPName, decodeCharForError, getCurrentColumnNr, getCurrentLocation, getEndingByteOffset, getEndingCharOffset, getStartingByteOffset, getStartingCharOffset, markLF, markLF, reportInvalidInitial, reportInvalidOther, setStartLocationMethods inherited from class com.fasterxml.aalto.in.XmlScanner
bindName, bindNs, checkImmutableBinding, close, decodeAttrBinaryValue, decodeAttrValue, decodeAttrValues, decodeElements, findAttrIndex, findOrCreateBinding, finishCData, finishCharacters, finishComment, finishDTD, finishPI, finishSpace, finishToken, fireSaxCharacterEvents, fireSaxCommentEvent, fireSaxEndElement, fireSaxPIEvent, fireSaxSpaceEvents, fireSaxStartElement, getAttrCollector, getAttrCount, getAttrLocalName, getAttrNsURI, getAttrPrefix, getAttrPrefixedName, getAttrQName, getAttrType, getAttrValue, getAttrValue, getConfig, getCurrentLineNr, getDepth, getDTDPublicId, getDTDSystemId, getEndLocation, getInputPublicId, getInputSystemId, getName, getNamespacePrefix, getNamespaceURI, getNamespaceURI, getNamespaceURI, getNonTransientNamespaceContext, getNsCount, getPrefix, getPrefixes, getQName, getStartLocation, getText, getText, getTextCharacters, getTextCharacters, getTextLength, handleInvalidXmlChar, hasEmptyStack, isAttrSpecified, isEmptyTag, isTextWhitespace, loadMoreGuaranteed, loadMoreGuaranteed, reportDoubleHyphenInComments, reportDuplicateNsDecl, reportEntityOverflow, reportEofInName, reportIllegalCDataEnd, reportIllegalNsDecl, reportIllegalNsDecl, reportInputProblem, reportInvalidNameChar, reportInvalidNsIndex, reportInvalidXmlChar, reportMissingPISpace, reportMultipleColonsInName, reportPrologProblem, reportPrologUnexpChar, reportPrologUnexpElement, reportTreeUnexpChar, reportUnboundPrefix, reportUnexpandedEntityInAttr, reportUnexpectedEndTag, resetForDecoding, skipCData, skipCharacters, skipCoalescedText, skipComment, skipPI, skipSpace, skipToken, throwInvalidSpace, throwNullChar, throwUnexpectedChar, verifyXmlChar
-
Field Details
-
_in
Underlying InputStream to use for reading content. -
_inputBuffer
protected byte[] _inputBuffer -
_charTypes
This is a simple container object that is used to access the decoding tables for characters. Indirection is needed since we actually support multiple utf-8 compatible encodings, not just utf-8 itself. -
_symbols
For now, symbol table contains prefixed names. In future it is possible that they may be split into prefixes and local names? -
_quadBuffer
protected int[] _quadBufferThis buffer is used for name parsing. Will be expanded if/as needed; 32 ints can hold names 128 ascii chars long.
-
-
Constructor Details
-
StreamScanner
-
-
Method Details
-
_releaseBuffers
protected void _releaseBuffers()- Overrides:
_releaseBuffersin classXmlScanner
-
_closeSource
- Specified by:
_closeSourcein classByteBasedScanner- Throws:
IOException
-
handleEntityInText
- Throws:
XMLStreamException
-
parsePublicId
- Throws:
XMLStreamException
-
parseSystemId
- Throws:
XMLStreamException
-
nextFromProlog
- Specified by:
nextFromPrologin classXmlScanner- Throws:
XMLStreamException
-
nextFromTree
- Specified by:
nextFromTreein classXmlScanner- Throws:
XMLStreamException
-
_nextEntity
protected int _nextEntity()Helper method used to isolate things that need to be (re)set in cases where -
handlePrologDeclStart
- Throws:
XMLStreamException
-
handleDtdStart
- Throws:
XMLStreamException
-
handleCommentOrCdataStart
- Throws:
XMLStreamException
-
handlePIStart
Method called after leading 'invalid input: '<'?' has been parsed; needs to parse target.- Throws:
XMLStreamException
-
handleCharEntity
- Returns:
- Code point for the entity that expands to a valid XML content character.
- Throws:
XMLStreamException
-
handleStartElement
Parsing of start element requires parsing of the element name (and attribute names), and is thus encoding-specific.- Throws:
XMLStreamException
-
handleEndElement
Note that this method is currently also shareable for all Ascii-based encodings, and at least between UTF-8 and ISO-Latin1. The reason is that since we already know exact bytes that need to be matched, there's no danger of getting invalid encodings or such. So, for now, let's leave this method here in the base class.- Throws:
XMLStreamException
-
handleEndElementSlow
- Throws:
XMLStreamException
-
parsePName
This method can (for now?) be shared between all Ascii-based encodings, since it only does coarse validity checking -- real checks are done in different method.Some notes about assumption implementation makes:
- Well-formed xml content can not end with a name: as such, end-of-input is an error and we can throw an exception
- Throws:
XMLStreamException
-
parsePNameMedium
- Throws:
XMLStreamException
-
parsePNameLong
- Throws:
XMLStreamException
-
parsePNameSlow
- Throws:
XMLStreamException
-
findPName
Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).- Parameters:
onlyQuad- Word with 1 to 4 bytes that make up PNamelastByteCount- Number of actual bytes contained in onlyQuad; 0 to 3.- Throws:
XMLStreamException
-
findPName
private final PName findPName(int firstQuad, int secondQuad, int lastByteCount) throws XMLStreamException Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).- Parameters:
firstQuad- First 1 to 4 bytes of the PNamesecondQuad- Word with last 1 to 4 bytes of the PNamelastByteCount- Number of bytes contained in secondQuad; 0 to 3.- Throws:
XMLStreamException
-
findPName
private final PName findPName(int lastQuad, int[] quads, int qlen, int lastByteCount) throws XMLStreamException Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).- Parameters:
lastQuad- Word with last 0 to 3 bytes of the PName; not included in the quad arrayquads- Array that contains all the quads, except for the last one, for names with more than 8 bytes (i.e. more than 2 quads)qlen- Number of quads in the array, except if less than 2 (in which case only firstQuad and lastQuad are used)lastByteCount- Number of bytes contained in lastQuad; 0 to 3.- Throws:
XMLStreamException
-
findPName
private final PName findPName(int lastQuad, int lastByteCount, int firstQuad, int qlen, int[] quads) throws XMLStreamException Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).- Parameters:
lastQuad- Word with last 0 to 3 bytes of the PName; not included in the quad arraylastByteCount- Number of bytes contained in lastQuad; 0 to 3.firstQuad- First 1 to 4 bytes of the PName (4 if length at least 4 bytes; less only if not).qlen- Number of quads in the array, except if less than 2 (in which case only firstQuad and lastQuad are used)quads- Array that contains all the quads, except for the last one, for names with more than 8 bytes (i.e. more than 2 quads)- Throws:
XMLStreamException
-
addPName
protected final PName addPName(int hash, int[] quads, int qlen, int lastQuadBytes) throws XMLStreamException - Throws:
XMLStreamException
-
skipInternalWs
- Returns:
- First byte following skipped white space
- Throws:
XMLStreamException
-
matchAsciiKeyword
- Throws:
XMLStreamException
-
checkInTreeIndentation
Note: consequtive white space is only considered indentation, if the following token seems like a tag (start/end). This so that if a CDATA section follows, it can be coalesced in coalescing mode. Although we could check if coalescing mode is enabled, this should seldom have significant effect either way, so it removes one possible source of problems in coalescing mode.
- Returns:
- -1, if indentation was handled; offset in the output buffer, if not
- Throws:
XMLStreamException
-
checkPrologIndentation
- Returns:
- -1, if indentation was handled; offset in the output buffer, if not
- Throws:
XMLStreamException
-
loadMore
- Specified by:
loadMorein classXmlScanner- Throws:
XMLStreamException
-
nextByte
- Throws:
XMLStreamException
-
nextByte
- Throws:
XMLStreamException
-
loadOne
- Throws:
XMLStreamException
-
loadOne
- Throws:
XMLStreamException
-
loadAndRetain
- Throws:
XMLStreamException
-