Package nu.validator.htmlparser.impl
Class Tokenizer
java.lang.Object
nu.validator.htmlparser.impl.Tokenizer
- All Implemented Interfaces:
Locator
- Direct Known Subclasses:
ErrorReportingTokenizer
An implementation of
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html
This class implements the
Locator interface. This is not an
incidental implementation detail: Users of this class are encouraged to make
use of the Locator nature.
By default, the tokenizer may report data that XML 1.0 bans. The tokenizer
can be configured to treat these conditions as fatal or to coerce the infoset
to something that XML 1.0 allows.- Version:
- $Id$
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate charstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intprotected LocatorImplprivate final char[]Buffer for expanding astral NCRs.static final intstatic final intstatic final intstatic final intprotected AttributeNameThe current attribute name.private HtmlAttributesThe attribute holder.static final intstatic final intstatic final intstatic final intstatic final intstatic final intprivate final char[]Buffer for expanding NCRs falling into the Basic Multilingual Plane.static final intstatic final intstatic final intprivate static final intBuffer growth parameter.private intprivate static final char[]"CDATA[" aschar[]static final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intprivate XmlViolationPolicyThe policy for comments.protected booleanstatic final intstatic final intprivate XmlViolationPolicyThe policy for vertical tab and form feed.protected intstatic final intprivate static final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intprivate StringThe name of the current doctype token.protected EncodingDeclarationHandlerprotected booleantrueif tokenizing an end tagprotected ElementNameThe element whose end tag closes the current CDATA or RCDATA element.private char[]private intprotected ErrorHandlerThe error handler.private intprivate booleanstatic final intstatic final intstatic final intprivate intprotected booleantruewhen HTML4-specific additional errors are requested.private booleanprivate static final char[]protected intprivate Internerprotected booleanWhether the previous char read was CR.private static final intMagic value for UTF-16 operations.private static final char[]Array version of line feed.private intprivate intprivate char[]Buffer for long strings.private intNumber of significantchars inlongStrBuf.private static final char[]UTF-16 code unit array containing less than and greater than for emitting those characters on certain parse errors.private static final char[]UTF-16 code unit array containing less than and solidus for emitting those characters on certain parse errors.private intstatic final intstatic final intstatic final intprivate booleanWhether the stream is past the first 512 bytes.private XmlViolationPolicyprivate final booleanprivate static final char[]private static final char[]static final intprivate static final char[]private static final char[]"octype" aschar[]static final intprivate static final char[]private intstatic final intstatic final intprivate StringThe SAX public id for the resource being tokenized.private StringThe public id of the current doctype token.static final intstatic final intstatic final intprivate static final char[]Array version of U+FFFD.private intprivate static final char[]UTF-16 code unit array containing ]] for emitting those characters on state transitions.private static final char[]static final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intprivate booleanstatic final intprivate booleanprivate static final char[]Array version of space.protected intprivate char[]Buffer for short identifiers.private intNumber of significantchars instrBuf.private intprivate static final char[]private StringThe SAX system id for the resource being tokenized.private StringThe system id of the current doctype token.static final intstatic final intprivate ElementNameThe current tag token name.private static final char[]private static final char[]protected final TokenHandlerThe token handler.private static final char[]"ublic" aschar[]protected intprivate booleanWhether comment tokens are emitted.private XmlViolationPolicyprivate static final char[]private static final char[]"ystem" aschar[] -
Constructor Summary
ConstructorsConstructorDescriptionTokenizer(TokenHandler tokenHandler) The constructor.Tokenizer(TokenHandler tokenHandler, boolean newAttributesEachTime) -
Method Summary
Modifier and TypeMethodDescriptionprivate voidprivate voidprivate voidprivate voidprivate voidprivate voidappendLongStrBuf(char c) Appends to the larger buffer.private voidappendLongStrBuf(char[] buffer, int offset, int length) private voidprivate voidprivate voidprivate voidappendStrBuf(char c) Appends to the smaller buffer.private voidAppend the contents of the smaller buffer to the larger one.private voidvoidprivate voidprivate voidprotected charcheckChar(char[] buf, int pos) private voidprivate voidclearLongStrBufAndAppend(char c) private voidprivate voidclearStrBufAndAppend(char c) (package private) voidprivate voidemitCarriageReturn(char[] buf, int pos) private voidemitComment(int provisionalHyphens, int pos) Emits the current comment token.private intemitCurrentTagToken(boolean selfClosing, int pos) private voidemitDoctypeToken(int pos) private voidemitOrAppendOne(char[] val, int returnState) private voidemitOrAppendStrBuf(int returnState) private voidemitOrAppendTwo(char[] val, int returnState) private voidemitPlaintextReplacementCharacter(char[] buf, int pos) private voidemitReplacementCharacter(char[] buf, int pos) private voidEmits the smaller buffer as character tokens.(package private) HtmlAttributesvoidend()private voidvoideof()voidReports a Parse Error.protected voiderrAstralNonCharacter(int ch) protected voidprotected voiderrBadCharAfterLt(char c) protected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voiderrHtml4LtSlashInRcdata(char folded) protected voidprotected voidprotected voidprotected voiderrLtGt()protected voidprotected voidprotected voidprotected voidprotected voidprotected charerrNcrControlChar(char ch) protected voiderrNcrCr()protected voidprotected charerrNcrNonCharacter(char ch) protected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voiderrQuoteBeforeAttributeName(char c) protected voidprotected voidvoiderrTreeBuilder(String message) protected voidprotected voiderrUnquotedAttributeValOrNull(char c) protected voidvoidReports an condition that would make the infoset incompatible with XML 1.0 as fatal.protected voidflushChars(char[] buf, int pos) Flushes coalesced character tokens.intgetCol()Returns the col.intintgetLine()Returns the line.intprivate voidhandleNcrValue(int returnState) private voidvoidvoidinitLocation(String newPublicId, String newSystemId) booleaninternalEncodingDeclaration(String internalCharset) booleanReturns the alreadyComplainedAboutNonAscii.booleanbooleanReturns the mappingLangToXmlLang.booleanReturns the nextCharOnNewLine.booleanisPrevCR()voidprivate StringThe larger buffer as a string.private voidprotected voidprotected voidmaybeErrSlashInEndTag(boolean selfClosing) protected voidmaybeWarnPrivateUse(char ch) protected voidprivate static Stringprotected voidprotected voidvoidvoidprivate voidvoidprivate voidvoidsetCommentPolicy(XmlViolationPolicy commentPolicy) Sets the commentPolicy.voidsetContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy) Sets the contentNonXmlCharPolicy.voidsetContentSpacePolicy(XmlViolationPolicy contentSpacePolicy) Sets the contentSpacePolicy.voidsetEncodingDeclarationHandler(EncodingDeclarationHandler encodingDeclarationHandler) Sets the encodingDeclarationHandler.voidSets the error handler.voidsetHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata) Sets the html4ModeCompatibleWithXhtml1Schemata.voidsetInterner(Interner interner) voidsetLineNumber(int line) For C++ use only.voidsetMappingLangToXmlLang(boolean mappingLangToXmlLang) Sets the mappingLangToXmlLang.voidsetNamePolicy(XmlViolationPolicy namePolicy) voidsetStateAndEndTagExpectation(int specialTokenizerState, String endTagExpectation) Sets the tokenizer state and the associated element name.voidsetStateAndEndTagExpectation(int specialTokenizerState, ElementName endTagExpectation) Sets the tokenizer state and the associated element name.voidsetTransitionBaseOffset(int offset) Sets an offset to be added to the position reported toTransitionHandler.voidsetXmlnsPolicy(XmlViolationPolicy xmlnsPolicy) Sets the xmlnsPolicy.protected voidprotected voidvoidstart()protected voidprivate intstateLoop(int state, char c, int pos, char[] buf, boolean reconsume, int returnState, int endPos) private voidReturns the short buffer as a local name.private voidprotected StringThe smaller buffer as a String.booleantokenizeBuffer(UTF16Buffer buffer) protected inttransition(int from, int to, boolean reconsume, int pos) (package private) voidvoidReports a warningprivate longworkAroundHotSpotHugeMethodLimit(int state, char c, int pos, char[] buf, boolean reconsume, int returnState, int endPos) compressed returnValue: int returnState = returnValue >> 33 boolean breakOuterState = ((returnValue >> 32) invalid input: '&' 0x1) != 0) int pos = returnValue invalid input: '&' 0xFFFFFFFF // same as (int)returnValue
-
Field Details
-
DATA_AND_RCDATA_MASK
private static final int DATA_AND_RCDATA_MASK- See Also:
-
DATA
public static final int DATA- See Also:
-
RCDATA
public static final int RCDATA- See Also:
-
SCRIPT_DATA
public static final int SCRIPT_DATA- See Also:
-
RAWTEXT
public static final int RAWTEXT- See Also:
-
SCRIPT_DATA_ESCAPED
public static final int SCRIPT_DATA_ESCAPED- See Also:
-
ATTRIBUTE_VALUE_DOUBLE_QUOTED
public static final int ATTRIBUTE_VALUE_DOUBLE_QUOTED- See Also:
-
ATTRIBUTE_VALUE_SINGLE_QUOTED
public static final int ATTRIBUTE_VALUE_SINGLE_QUOTED- See Also:
-
ATTRIBUTE_VALUE_UNQUOTED
public static final int ATTRIBUTE_VALUE_UNQUOTED- See Also:
-
PLAINTEXT
public static final int PLAINTEXT- See Also:
-
TAG_OPEN
public static final int TAG_OPEN- See Also:
-
CLOSE_TAG_OPEN
public static final int CLOSE_TAG_OPEN- See Also:
-
TAG_NAME
public static final int TAG_NAME- See Also:
-
BEFORE_ATTRIBUTE_NAME
public static final int BEFORE_ATTRIBUTE_NAME- See Also:
-
ATTRIBUTE_NAME
public static final int ATTRIBUTE_NAME- See Also:
-
AFTER_ATTRIBUTE_NAME
public static final int AFTER_ATTRIBUTE_NAME- See Also:
-
BEFORE_ATTRIBUTE_VALUE
public static final int BEFORE_ATTRIBUTE_VALUE- See Also:
-
AFTER_ATTRIBUTE_VALUE_QUOTED
public static final int AFTER_ATTRIBUTE_VALUE_QUOTED- See Also:
-
BOGUS_COMMENT
public static final int BOGUS_COMMENT- See Also:
-
MARKUP_DECLARATION_OPEN
public static final int MARKUP_DECLARATION_OPEN- See Also:
-
DOCTYPE
public static final int DOCTYPE- See Also:
-
BEFORE_DOCTYPE_NAME
public static final int BEFORE_DOCTYPE_NAME- See Also:
-
DOCTYPE_NAME
public static final int DOCTYPE_NAME- See Also:
-
AFTER_DOCTYPE_NAME
public static final int AFTER_DOCTYPE_NAME- See Also:
-
BEFORE_DOCTYPE_PUBLIC_IDENTIFIER
public static final int BEFORE_DOCTYPE_PUBLIC_IDENTIFIER- See Also:
-
DOCTYPE_PUBLIC_IDENTIFIER_DOUBLE_QUOTED
public static final int DOCTYPE_PUBLIC_IDENTIFIER_DOUBLE_QUOTED- See Also:
-
DOCTYPE_PUBLIC_IDENTIFIER_SINGLE_QUOTED
public static final int DOCTYPE_PUBLIC_IDENTIFIER_SINGLE_QUOTED- See Also:
-
AFTER_DOCTYPE_PUBLIC_IDENTIFIER
public static final int AFTER_DOCTYPE_PUBLIC_IDENTIFIER- See Also:
-
BEFORE_DOCTYPE_SYSTEM_IDENTIFIER
public static final int BEFORE_DOCTYPE_SYSTEM_IDENTIFIER- See Also:
-
DOCTYPE_SYSTEM_IDENTIFIER_DOUBLE_QUOTED
public static final int DOCTYPE_SYSTEM_IDENTIFIER_DOUBLE_QUOTED- See Also:
-
DOCTYPE_SYSTEM_IDENTIFIER_SINGLE_QUOTED
public static final int DOCTYPE_SYSTEM_IDENTIFIER_SINGLE_QUOTED- See Also:
-
AFTER_DOCTYPE_SYSTEM_IDENTIFIER
public static final int AFTER_DOCTYPE_SYSTEM_IDENTIFIER- See Also:
-
BOGUS_DOCTYPE
public static final int BOGUS_DOCTYPE- See Also:
-
COMMENT_START
public static final int COMMENT_START- See Also:
-
COMMENT_START_DASH
public static final int COMMENT_START_DASH- See Also:
-
COMMENT
public static final int COMMENT- See Also:
-
COMMENT_END_DASH
public static final int COMMENT_END_DASH- See Also:
-
COMMENT_END
public static final int COMMENT_END- See Also:
-
COMMENT_END_BANG
public static final int COMMENT_END_BANG- See Also:
-
NON_DATA_END_TAG_NAME
public static final int NON_DATA_END_TAG_NAME- See Also:
-
MARKUP_DECLARATION_HYPHEN
public static final int MARKUP_DECLARATION_HYPHEN- See Also:
-
MARKUP_DECLARATION_OCTYPE
public static final int MARKUP_DECLARATION_OCTYPE- See Also:
-
DOCTYPE_UBLIC
public static final int DOCTYPE_UBLIC- See Also:
-
DOCTYPE_YSTEM
public static final int DOCTYPE_YSTEM- See Also:
-
AFTER_DOCTYPE_PUBLIC_KEYWORD
public static final int AFTER_DOCTYPE_PUBLIC_KEYWORD- See Also:
-
BETWEEN_DOCTYPE_PUBLIC_AND_SYSTEM_IDENTIFIERS
public static final int BETWEEN_DOCTYPE_PUBLIC_AND_SYSTEM_IDENTIFIERS- See Also:
-
AFTER_DOCTYPE_SYSTEM_KEYWORD
public static final int AFTER_DOCTYPE_SYSTEM_KEYWORD- See Also:
-
CONSUME_CHARACTER_REFERENCE
public static final int CONSUME_CHARACTER_REFERENCE- See Also:
-
CONSUME_NCR
public static final int CONSUME_NCR- See Also:
-
CHARACTER_REFERENCE_TAIL
public static final int CHARACTER_REFERENCE_TAIL- See Also:
-
HEX_NCR_LOOP
public static final int HEX_NCR_LOOP- See Also:
-
DECIMAL_NRC_LOOP
public static final int DECIMAL_NRC_LOOP- See Also:
-
HANDLE_NCR_VALUE
public static final int HANDLE_NCR_VALUE- See Also:
-
HANDLE_NCR_VALUE_RECONSUME
public static final int HANDLE_NCR_VALUE_RECONSUME- See Also:
-
CHARACTER_REFERENCE_HILO_LOOKUP
public static final int CHARACTER_REFERENCE_HILO_LOOKUP- See Also:
-
SELF_CLOSING_START_TAG
public static final int SELF_CLOSING_START_TAG- See Also:
-
CDATA_START
public static final int CDATA_START- See Also:
-
CDATA_SECTION
public static final int CDATA_SECTION- See Also:
-
CDATA_RSQB
public static final int CDATA_RSQB- See Also:
-
CDATA_RSQB_RSQB
public static final int CDATA_RSQB_RSQB- See Also:
-
SCRIPT_DATA_LESS_THAN_SIGN
public static final int SCRIPT_DATA_LESS_THAN_SIGN- See Also:
-
SCRIPT_DATA_ESCAPE_START
public static final int SCRIPT_DATA_ESCAPE_START- See Also:
-
SCRIPT_DATA_ESCAPE_START_DASH
public static final int SCRIPT_DATA_ESCAPE_START_DASH- See Also:
-
SCRIPT_DATA_ESCAPED_DASH
public static final int SCRIPT_DATA_ESCAPED_DASH- See Also:
-
SCRIPT_DATA_ESCAPED_DASH_DASH
public static final int SCRIPT_DATA_ESCAPED_DASH_DASH- See Also:
-
BOGUS_COMMENT_HYPHEN
public static final int BOGUS_COMMENT_HYPHEN- See Also:
-
RAWTEXT_RCDATA_LESS_THAN_SIGN
public static final int RAWTEXT_RCDATA_LESS_THAN_SIGN- See Also:
-
SCRIPT_DATA_ESCAPED_LESS_THAN_SIGN
public static final int SCRIPT_DATA_ESCAPED_LESS_THAN_SIGN- See Also:
-
SCRIPT_DATA_DOUBLE_ESCAPE_START
public static final int SCRIPT_DATA_DOUBLE_ESCAPE_START- See Also:
-
SCRIPT_DATA_DOUBLE_ESCAPED
public static final int SCRIPT_DATA_DOUBLE_ESCAPED- See Also:
-
SCRIPT_DATA_DOUBLE_ESCAPED_LESS_THAN_SIGN
public static final int SCRIPT_DATA_DOUBLE_ESCAPED_LESS_THAN_SIGN- See Also:
-
SCRIPT_DATA_DOUBLE_ESCAPED_DASH
public static final int SCRIPT_DATA_DOUBLE_ESCAPED_DASH- See Also:
-
SCRIPT_DATA_DOUBLE_ESCAPED_DASH_DASH
public static final int SCRIPT_DATA_DOUBLE_ESCAPED_DASH_DASH- See Also:
-
SCRIPT_DATA_DOUBLE_ESCAPE_END
public static final int SCRIPT_DATA_DOUBLE_ESCAPE_END- See Also:
-
PROCESSING_INSTRUCTION
public static final int PROCESSING_INSTRUCTION- See Also:
-
PROCESSING_INSTRUCTION_QUESTION_MARK
public static final int PROCESSING_INSTRUCTION_QUESTION_MARK- See Also:
-
LEAD_OFFSET
private static final int LEAD_OFFSETMagic value for UTF-16 operations.- See Also:
-
LT_GT
private static final char[] LT_GTUTF-16 code unit array containing less than and greater than for emitting those characters on certain parse errors. -
LT_SOLIDUS
private static final char[] LT_SOLIDUSUTF-16 code unit array containing less than and solidus for emitting those characters on certain parse errors. -
RSQB_RSQB
private static final char[] RSQB_RSQBUTF-16 code unit array containing ]] for emitting those characters on state transitions. -
REPLACEMENT_CHARACTER
private static final char[] REPLACEMENT_CHARACTERArray version of U+FFFD. -
SPACE
private static final char[] SPACEArray version of space. -
LF
private static final char[] LFArray version of line feed. -
BUFFER_GROW_BY
private static final int BUFFER_GROW_BYBuffer growth parameter.- See Also:
-
CDATA_LSQB
private static final char[] CDATA_LSQB"CDATA[" aschar[] -
OCTYPE
private static final char[] OCTYPE"octype" aschar[] -
UBLIC
private static final char[] UBLIC"ublic" aschar[] -
YSTEM
private static final char[] YSTEM"ystem" aschar[] -
TITLE_ARR
private static final char[] TITLE_ARR -
SCRIPT_ARR
private static final char[] SCRIPT_ARR -
STYLE_ARR
private static final char[] STYLE_ARR -
PLAINTEXT_ARR
private static final char[] PLAINTEXT_ARR -
XMP_ARR
private static final char[] XMP_ARR -
TEXTAREA_ARR
private static final char[] TEXTAREA_ARR -
IFRAME_ARR
private static final char[] IFRAME_ARR -
NOEMBED_ARR
private static final char[] NOEMBED_ARR -
NOSCRIPT_ARR
private static final char[] NOSCRIPT_ARR -
NOFRAMES_ARR
private static final char[] NOFRAMES_ARR -
tokenHandler
The token handler. -
encodingDeclarationHandler
-
errorHandler
The error handler. -
lastCR
protected boolean lastCRWhether the previous char read was CR. -
stateSave
protected int stateSave -
returnStateSave
private int returnStateSave -
index
protected int index -
forceQuirks
private boolean forceQuirks -
additional
private char additional -
entCol
private int entCol -
firstCharKey
private int firstCharKey -
lo
private int lo -
hi
private int hi -
candidate
private int candidate -
strBufMark
private int strBufMark -
prevValue
private int prevValue -
value
protected int value -
seenDigits
private boolean seenDigits -
cstart
protected int cstart -
publicId
The SAX public id for the resource being tokenized. (Only passed to back as part of locator data.) -
systemId
The SAX system id for the resource being tokenized. (Only passed to back as part of locator data.) -
strBuf
private char[] strBufBuffer for short identifiers. -
strBufLen
private int strBufLenNumber of significantchars instrBuf. -
longStrBuf
private char[] longStrBufBuffer for long strings. -
longStrBufLen
private int longStrBufLenNumber of significantchars inlongStrBuf. -
bmpChar
private final char[] bmpCharBuffer for expanding NCRs falling into the Basic Multilingual Plane. -
astralChar
private final char[] astralCharBuffer for expanding astral NCRs. -
endTagExpectation
The element whose end tag closes the current CDATA or RCDATA element. -
endTagExpectationAsArray
private char[] endTagExpectationAsArray -
endTag
protected boolean endTagtrueif tokenizing an end tag -
tagName
The current tag token name. -
attributeName
The current attribute name. -
wantsComments
private boolean wantsCommentsWhether comment tokens are emitted. -
html4
protected boolean html4truewhen HTML4-specific additional errors are requested. -
metaBoundaryPassed
private boolean metaBoundaryPassedWhether the stream is past the first 512 bytes. -
doctypeName
The name of the current doctype token. -
publicIdentifier
The public id of the current doctype token. -
systemIdentifier
The system id of the current doctype token. -
attributes
The attribute holder. -
contentSpacePolicy
The policy for vertical tab and form feed. -
commentPolicy
The policy for comments. -
xmlnsPolicy
-
namePolicy
-
html4ModeCompatibleWithXhtml1Schemata
private boolean html4ModeCompatibleWithXhtml1Schemata -
newAttributesEachTime
private final boolean newAttributesEachTime -
mappingLangToXmlLang
private int mappingLangToXmlLang -
shouldSuspend
private boolean shouldSuspend -
confident
protected boolean confident -
line
private int line -
interner
-
ampersandLocation
-
-
Constructor Details
-
Tokenizer
-
Tokenizer
The constructor.- Parameters:
tokenHandler- the handler for receiving tokens
-
-
Method Details
-
setInterner
-
initLocation
-
isMappingLangToXmlLang
public boolean isMappingLangToXmlLang()Returns the mappingLangToXmlLang.- Returns:
- the mappingLangToXmlLang
-
setMappingLangToXmlLang
public void setMappingLangToXmlLang(boolean mappingLangToXmlLang) Sets the mappingLangToXmlLang.- Parameters:
mappingLangToXmlLang- the mappingLangToXmlLang to set
-
setErrorHandler
Sets the error handler.- See Also:
-
getErrorHandler
-
setCommentPolicy
Sets the commentPolicy.- Parameters:
commentPolicy- the commentPolicy to set
-
setContentNonXmlCharPolicy
Sets the contentNonXmlCharPolicy.- Parameters:
contentNonXmlCharPolicy- the contentNonXmlCharPolicy to set
-
setContentSpacePolicy
Sets the contentSpacePolicy.- Parameters:
contentSpacePolicy- the contentSpacePolicy to set
-
setXmlnsPolicy
Sets the xmlnsPolicy.- Parameters:
xmlnsPolicy- the xmlnsPolicy to set
-
setNamePolicy
-
setHtml4ModeCompatibleWithXhtml1Schemata
public void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata) Sets the html4ModeCompatibleWithXhtml1Schemata.- Parameters:
html4ModeCompatibleWithXhtml1Schemata- the html4ModeCompatibleWithXhtml1Schemata to set
-
setStateAndEndTagExpectation
Sets the tokenizer state and the associated element name. This should only ever used to put the tokenizer into one of the states that have a special end tag expectation.- Parameters:
specialTokenizerState- the tokenizer state to setendTagExpectation- the expected end tag for transitioning back to normal
-
setStateAndEndTagExpectation
Sets the tokenizer state and the associated element name. This should only ever used to put the tokenizer into one of the states that have a special end tag expectation.- Parameters:
specialTokenizerState- the tokenizer state to setendTagExpectation- the expected end tag for transitioning back to normal
-
endTagExpectationToArray
private void endTagExpectationToArray() -
setLineNumber
public void setLineNumber(int line) For C++ use only. -
getLineNumber
public int getLineNumber()- Specified by:
getLineNumberin interfaceLocator- See Also:
-
getColumnNumber
public int getColumnNumber()- Specified by:
getColumnNumberin interfaceLocator- See Also:
-
getPublicId
- Specified by:
getPublicIdin interfaceLocator- See Also:
-
getSystemId
- Specified by:
getSystemIdin interfaceLocator- See Also:
-
notifyAboutMetaBoundary
public void notifyAboutMetaBoundary() -
turnOnAdditionalHtml4Errors
void turnOnAdditionalHtml4Errors() -
emptyAttributes
HtmlAttributes emptyAttributes() -
clearStrBufAndAppend
private void clearStrBufAndAppend(char c) -
clearStrBuf
private void clearStrBuf() -
appendStrBuf
private void appendStrBuf(char c) Appends to the smaller buffer.- Parameters:
c- the UTF-16 code unit to append
-
strBufToString
The smaller buffer as a String. Currently only used for error reporting.C++ memory note: The return value must be released.
- Returns:
- the smaller buffer as a string
-
strBufToDoctypeName
private void strBufToDoctypeName()Returns the short buffer as a local name. The return value is released in emitDoctypeToken(). -
emitStrBuf
Emits the smaller buffer as character tokens.- Throws:
SAXException- if the token handler threw
-
clearLongStrBuf
private void clearLongStrBuf() -
clearLongStrBufAndAppend
private void clearLongStrBufAndAppend(char c) -
appendLongStrBuf
private void appendLongStrBuf(char c) Appends to the larger buffer.- Parameters:
c- the UTF-16 code unit to append
-
appendSecondHyphenToBogusComment
- Throws:
SAXException
-
maybeAppendSpaceToBogusComment
- Throws:
SAXException
-
adjustDoubleHyphenAndAppendToLongStrBufAndErr
- Throws:
SAXException
-
appendLongStrBuf
private void appendLongStrBuf(char[] buffer, int offset, int length) -
appendStrBufToLongStrBuf
private void appendStrBufToLongStrBuf()Append the contents of the smaller buffer to the larger one. -
longStrBufToString
The larger buffer as a string.C++ memory note: The return value must be released.
- Returns:
- the larger buffer as a string
-
emitComment
Emits the current comment token.- Parameters:
pos- TODO- Throws:
SAXException
-
flushChars
Flushes coalesced character tokens.- Parameters:
buf- TODOpos- TODO- Throws:
SAXException
-
fatal
Reports an condition that would make the infoset incompatible with XML 1.0 as fatal.- Parameters:
message- the message- Throws:
SAXExceptionSAXParseException
-
err
Reports a Parse Error.- Parameters:
message- the message- Throws:
SAXException
-
errTreeBuilder
- Throws:
SAXException
-
warn
Reports a warning- Parameters:
message- the message- Throws:
SAXException
-
resetAttributes
private void resetAttributes() -
strBufToElementNameString
private void strBufToElementNameString() -
emitCurrentTagToken
- Throws:
SAXException
-
attributeNameComplete
- Throws:
SAXException
-
addAttributeWithoutValue
- Throws:
SAXException
-
addAttributeWithValue
- Throws:
SAXException
-
newAsciiLowerCaseStringFromString
-
startErrorReporting
- Throws:
SAXException
-
start
- Throws:
SAXException
-
tokenizeBuffer
- Throws:
SAXException
-
stateLoop
private int stateLoop(int state, char c, int pos, char[] buf, boolean reconsume, int returnState, int endPos) throws SAXException - Throws:
SAXException
-
workAroundHotSpotHugeMethodLimit
private long workAroundHotSpotHugeMethodLimit(int state, char c, int pos, char[] buf, boolean reconsume, int returnState, int endPos) throws SAXException compressed returnValue: int returnState = returnValue >> 33 boolean breakOuterState = ((returnValue >> 32) invalid input: '&' 0x1) != 0) int pos = returnValue invalid input: '&' 0xFFFFFFFF // same as (int)returnValue- Throws:
SAXException
-
transition
- Throws:
SAXException
-
initDoctypeFields
private void initDoctypeFields() -
adjustDoubleHyphenAndAppendToLongStrBufCarriageReturn
- Throws:
SAXException
-
adjustDoubleHyphenAndAppendToLongStrBufLineFeed
- Throws:
SAXException
-
appendLongStrBufLineFeed
private void appendLongStrBufLineFeed() -
appendLongStrBufCarriageReturn
private void appendLongStrBufCarriageReturn() -
silentCarriageReturn
protected void silentCarriageReturn() -
silentLineFeed
protected void silentLineFeed() -
emitCarriageReturn
- Throws:
SAXException
-
emitReplacementCharacter
- Throws:
SAXException
-
emitPlaintextReplacementCharacter
- Throws:
SAXException
-
setAdditionalAndRememberAmpersandLocation
private void setAdditionalAndRememberAmpersandLocation(char add) -
bogusDoctype
- Throws:
SAXException
-
bogusDoctypeWithoutQuirks
- Throws:
SAXException
-
emitOrAppendStrBuf
- Throws:
SAXException
-
handleNcrValue
- Throws:
SAXException
-
eof
- Throws:
SAXException
-
emitDoctypeToken
- Throws:
SAXException
-
checkChar
- Throws:
SAXException
-
isAlreadyComplainedAboutNonAscii
public boolean isAlreadyComplainedAboutNonAscii()Returns the alreadyComplainedAboutNonAscii.- Returns:
- the alreadyComplainedAboutNonAscii
-
internalEncodingDeclaration
- Throws:
SAXException
-
emitOrAppendTwo
- Parameters:
val-- Throws:
SAXException
-
emitOrAppendOne
- Throws:
SAXException
-
end
- Throws:
SAXException
-
requestSuspension
public void requestSuspension() -
becomeConfident
public void becomeConfident() -
isNextCharOnNewLine
public boolean isNextCharOnNewLine()Returns the nextCharOnNewLine.- Returns:
- the nextCharOnNewLine
-
isPrevCR
public boolean isPrevCR() -
getLine
public int getLine()Returns the line.- Returns:
- the line
-
getCol
public int getCol()Returns the col.- Returns:
- the col
-
isInDataState
public boolean isInDataState() -
resetToDataState
public void resetToDataState() -
loadState
- Throws:
SAXException
-
initializeWithoutStarting
- Throws:
SAXException
-
errGarbageAfterLtSlash
- Throws:
SAXException
-
errLtSlashGt
- Throws:
SAXException
-
errWarnLtSlashInRcdata
- Throws:
SAXException
-
errHtml4LtSlashInRcdata
- Throws:
SAXException
-
errCharRefLacksSemicolon
- Throws:
SAXException
-
errNoDigitsInNCR
- Throws:
SAXException
-
errGtInSystemId
- Throws:
SAXException
-
errGtInPublicId
- Throws:
SAXException
-
errNamelessDoctype
- Throws:
SAXException
-
errConsecutiveHyphens
- Throws:
SAXException
-
errPrematureEndOfComment
- Throws:
SAXException
-
errBogusComment
- Throws:
SAXException
-
errUnquotedAttributeValOrNull
- Throws:
SAXException
-
errSlashNotFollowedByGt
- Throws:
SAXException
-
errHtml4XmlVoidSyntax
- Throws:
SAXException
-
errNoSpaceBetweenAttributes
- Throws:
SAXException
-
errHtml4NonNameInUnquotedAttribute
- Throws:
SAXException
-
errLtOrEqualsOrGraveInUnquotedAttributeOrNull
- Throws:
SAXException
-
errAttributeValueMissing
- Throws:
SAXException
-
errBadCharBeforeAttributeNameOrNull
- Throws:
SAXException
-
errEqualsSignBeforeAttributeName
- Throws:
SAXException
-
errBadCharAfterLt
- Throws:
SAXException
-
errLtGt
- Throws:
SAXException
-
errProcessingInstruction
- Throws:
SAXException
-
errUnescapedAmpersandInterpretedAsCharacterReference
- Throws:
SAXException
-
errNotSemicolonTerminated
- Throws:
SAXException
-
errNoNamedCharacterMatch
- Throws:
SAXException
-
errQuoteBeforeAttributeName
- Throws:
SAXException
-
errQuoteOrLtInAttributeNameOrNull
- Throws:
SAXException
-
errExpectedPublicId
- Throws:
SAXException
-
errBogusDoctype
- Throws:
SAXException
-
maybeWarnPrivateUseAstral
- Throws:
SAXException
-
maybeWarnPrivateUse
- Throws:
SAXException
-
maybeErrAttributesOnEndTag
- Throws:
SAXException
-
maybeErrSlashInEndTag
- Throws:
SAXException
-
errNcrNonCharacter
- Throws:
SAXException
-
errAstralNonCharacter
- Throws:
SAXException
-
errNcrSurrogate
- Throws:
SAXException
-
errNcrControlChar
- Throws:
SAXException
-
errNcrCr
- Throws:
SAXException
-
errNcrInC1Range
- Throws:
SAXException
-
errEofInPublicId
- Throws:
SAXException
-
errEofInComment
- Throws:
SAXException
-
errEofInDoctype
- Throws:
SAXException
-
errEofInAttributeValue
- Throws:
SAXException
-
errEofInAttributeName
- Throws:
SAXException
-
errEofWithoutGt
- Throws:
SAXException
-
errEofInTagName
- Throws:
SAXException
-
errEofInEndTag
- Throws:
SAXException
-
errEofAfterLt
- Throws:
SAXException
-
errNcrOutOfRange
- Throws:
SAXException
-
errNcrUnassigned
- Throws:
SAXException
-
errDuplicateAttribute
- Throws:
SAXException
-
errEofInSystemId
- Throws:
SAXException
-
errExpectedSystemId
- Throws:
SAXException
-
errMissingSpaceBeforeDoctypeName
- Throws:
SAXException
-
errHyphenHyphenBang
- Throws:
SAXException
-
errNcrControlChar
- Throws:
SAXException
-
errNcrZero
- Throws:
SAXException
-
errNoSpaceBetweenDoctypeSystemKeywordAndQuote
- Throws:
SAXException
-
errNoSpaceBetweenPublicAndSystemIds
- Throws:
SAXException
-
errNoSpaceBetweenDoctypePublicKeywordAndQuote
- Throws:
SAXException
-
noteAttributeWithoutValue
- Throws:
SAXException
-
noteUnquotedAttributeValue
- Throws:
SAXException
-
setEncodingDeclarationHandler
Sets the encodingDeclarationHandler.- Parameters:
encodingDeclarationHandler- the encodingDeclarationHandler to set
-
destructor
void destructor() -
setTransitionBaseOffset
public void setTransitionBaseOffset(int offset) Sets an offset to be added to the position reported toTransitionHandler.- Parameters:
offset- the offset
-