Package nu.validator.htmlparser.impl
Class ErrorReportingTokenizer
java.lang.Object
nu.validator.htmlparser.impl.Tokenizer
nu.validator.htmlparser.impl.ErrorReportingTokenizer
- All Implemented Interfaces:
Locator
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate booleanUsed together withnonAsciiProhibited.private booleanKeeps track of PUA warnings.private intThe current column number in the current resource being tokenized.private intprivate XmlViolationPolicyThe policy for non-space non-XML characters.private intThe current line number in the current resource being parsed.private intprivate booleanprivate charprivate static final intMagic value for UTF-16 operations.private intprivate TransitionHandlerFields inherited from class nu.validator.htmlparser.impl.Tokenizer
AFTER_ATTRIBUTE_NAME, AFTER_ATTRIBUTE_VALUE_QUOTED, AFTER_DOCTYPE_NAME, AFTER_DOCTYPE_PUBLIC_IDENTIFIER, AFTER_DOCTYPE_PUBLIC_KEYWORD, AFTER_DOCTYPE_SYSTEM_IDENTIFIER, AFTER_DOCTYPE_SYSTEM_KEYWORD, ampersandLocation, ATTRIBUTE_NAME, ATTRIBUTE_VALUE_DOUBLE_QUOTED, ATTRIBUTE_VALUE_SINGLE_QUOTED, ATTRIBUTE_VALUE_UNQUOTED, attributeName, BEFORE_ATTRIBUTE_NAME, BEFORE_ATTRIBUTE_VALUE, BEFORE_DOCTYPE_NAME, BEFORE_DOCTYPE_PUBLIC_IDENTIFIER, BEFORE_DOCTYPE_SYSTEM_IDENTIFIER, BETWEEN_DOCTYPE_PUBLIC_AND_SYSTEM_IDENTIFIERS, BOGUS_COMMENT, BOGUS_COMMENT_HYPHEN, BOGUS_DOCTYPE, CDATA_RSQB, CDATA_RSQB_RSQB, CDATA_SECTION, CDATA_START, CHARACTER_REFERENCE_HILO_LOOKUP, CHARACTER_REFERENCE_TAIL, CLOSE_TAG_OPEN, COMMENT, COMMENT_END, COMMENT_END_BANG, COMMENT_END_DASH, COMMENT_START, COMMENT_START_DASH, confident, CONSUME_CHARACTER_REFERENCE, CONSUME_NCR, cstart, DATA, DECIMAL_NRC_LOOP, DOCTYPE, DOCTYPE_NAME, DOCTYPE_PUBLIC_IDENTIFIER_DOUBLE_QUOTED, DOCTYPE_PUBLIC_IDENTIFIER_SINGLE_QUOTED, DOCTYPE_SYSTEM_IDENTIFIER_DOUBLE_QUOTED, DOCTYPE_SYSTEM_IDENTIFIER_SINGLE_QUOTED, DOCTYPE_UBLIC, DOCTYPE_YSTEM, encodingDeclarationHandler, endTag, endTagExpectation, errorHandler, HANDLE_NCR_VALUE, HANDLE_NCR_VALUE_RECONSUME, HEX_NCR_LOOP, html4, index, lastCR, MARKUP_DECLARATION_HYPHEN, MARKUP_DECLARATION_OCTYPE, MARKUP_DECLARATION_OPEN, NON_DATA_END_TAG_NAME, PLAINTEXT, PROCESSING_INSTRUCTION, PROCESSING_INSTRUCTION_QUESTION_MARK, RAWTEXT, RAWTEXT_RCDATA_LESS_THAN_SIGN, RCDATA, SCRIPT_DATA, SCRIPT_DATA_DOUBLE_ESCAPE_END, SCRIPT_DATA_DOUBLE_ESCAPE_START, SCRIPT_DATA_DOUBLE_ESCAPED, SCRIPT_DATA_DOUBLE_ESCAPED_DASH, SCRIPT_DATA_DOUBLE_ESCAPED_DASH_DASH, SCRIPT_DATA_DOUBLE_ESCAPED_LESS_THAN_SIGN, SCRIPT_DATA_ESCAPE_START, SCRIPT_DATA_ESCAPE_START_DASH, SCRIPT_DATA_ESCAPED, SCRIPT_DATA_ESCAPED_DASH, SCRIPT_DATA_ESCAPED_DASH_DASH, SCRIPT_DATA_ESCAPED_LESS_THAN_SIGN, SCRIPT_DATA_LESS_THAN_SIGN, SELF_CLOSING_START_TAG, stateSave, TAG_NAME, TAG_OPEN, tokenHandler, value -
Constructor Summary
ConstructorsConstructorDescriptionErrorReportingTokenizer(TokenHandler tokenHandler) ErrorReportingTokenizer(TokenHandler tokenHandler, boolean newAttributesEachTime) -
Method Summary
Modifier and TypeMethodDescriptionprotected charcheckChar(char[] buf, int pos) private voidprotected voiderrAstralNonCharacter(int ch) protected voidprotected voiderrBadCharAfterLt(char c) protected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voiderrHtml4LtSlashInRcdata(char folded) protected voidprotected voidprotected voidprotected voiderrLtGt()protected voidprotected voidprotected voidprotected voidprotected voidprotected charerrNcrControlChar(char ch) protected voiderrNcrCr()protected voidprotected charerrNcrNonCharacter(char ch) protected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voidprotected voiderrQuoteBeforeAttributeName(char c) protected voidprotected voidprotected voidprotected voiderrUnquotedAttributeValOrNull(char c) protected voidprotected voidflushChars(char[] buf, int pos) Flushes coalesced character tokens.intgetCol()Returns the col.intintgetLine()Returns the line.intbooleanReturns the alreadyComplainedAboutNonAscii.private booleanisAstralPrivateUse(int c) Tells if the argument is an astral PUA character.booleanReturns the nextCharOnNewLine.private booleanisPrivateUse(char c) Tells if the argument is a BMP PUA character.protected voidprotected voidmaybeErrSlashInEndTag(boolean selfClosing) protected voidmaybeWarnPrivateUse(char ch) protected voidvoidReports on an event based on profile selected.protected voidprotected voidvoidsetContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy) Sets the contentNonXmlCharPolicy.voidsetErrorProfile(HashMap<String, String> errorProfileMap) Sets the errorProfile.voidsetTransitionBaseOffset(int offset) Sets an offset to be added to the position reported toTransitionHandler.voidsetTransitionHandler(TransitionHandler transitionHandler) Sets the transitionHandler.protected voidprotected voidprotected voidprivate StringtoUPlusString(int c) protected inttransition(int from, int to, boolean reconsume, int pos) private voidEmits a warning about private use characters if the warning has not been emitted yet.Methods inherited from class nu.validator.htmlparser.impl.Tokenizer
becomeConfident, destructor, emptyAttributes, end, eof, err, errTreeBuilder, fatal, getErrorHandler, getPublicId, getSystemId, initializeWithoutStarting, initLocation, internalEncodingDeclaration, isInDataState, isMappingLangToXmlLang, isPrevCR, loadState, notifyAboutMetaBoundary, requestSuspension, resetToDataState, setCommentPolicy, setContentSpacePolicy, setEncodingDeclarationHandler, setErrorHandler, setHtml4ModeCompatibleWithXhtml1Schemata, setInterner, setLineNumber, setMappingLangToXmlLang, setNamePolicy, setStateAndEndTagExpectation, setStateAndEndTagExpectation, setXmlnsPolicy, start, strBufToString, tokenizeBuffer, turnOnAdditionalHtml4Errors, warn
-
Field Details
-
SURROGATE_OFFSET
private static final int SURROGATE_OFFSETMagic value for UTF-16 operations.- See Also:
-
contentNonXmlCharPolicy
The policy for non-space non-XML characters. -
alreadyComplainedAboutNonAscii
private boolean alreadyComplainedAboutNonAsciiUsed together withnonAsciiProhibited. -
alreadyWarnedAboutPrivateUseCharacters
private boolean alreadyWarnedAboutPrivateUseCharactersKeeps track of PUA warnings. -
line
private int lineThe current line number in the current resource being parsed. (First line is 1.) Passed on as locator data. -
linePrev
private int linePrev -
col
private int colThe current column number in the current resource being tokenized. (First column is 1, counted by UTF-16 code units.) Passed on as locator data. -
colPrev
private int colPrev -
nextCharOnNewLine
private boolean nextCharOnNewLine -
prev
private char prev -
errorProfileMap
-
transitionHandler
-
transitionBaseOffset
private int transitionBaseOffset
-
-
Constructor Details
-
ErrorReportingTokenizer
- Parameters:
tokenHandler-newAttributesEachTime-
-
ErrorReportingTokenizer
- Parameters:
tokenHandler-
-
-
Method Details
-
getLineNumber
public int getLineNumber()- Specified by:
getLineNumberin interfaceLocator- Overrides:
getLineNumberin classTokenizer- See Also:
-
getColumnNumber
public int getColumnNumber()- Specified by:
getColumnNumberin interfaceLocator- Overrides:
getColumnNumberin classTokenizer- See Also:
-
setContentNonXmlCharPolicy
Sets the contentNonXmlCharPolicy.- Overrides:
setContentNonXmlCharPolicyin classTokenizer- Parameters:
contentNonXmlCharPolicy- the contentNonXmlCharPolicy to set
-
setErrorProfile
Sets the errorProfile.- Parameters:
errorProfile-
-
note
Reports on an event based on profile selected.- Parameters:
profile- the profile this message belongs tomessage- the message itself- Throws:
SAXException
-
startErrorReporting
- Overrides:
startErrorReportingin classTokenizer- Throws:
SAXException
-
silentCarriageReturn
protected void silentCarriageReturn()- Overrides:
silentCarriageReturnin classTokenizer
-
silentLineFeed
protected void silentLineFeed()- Overrides:
silentLineFeedin classTokenizer
-
getLine
public int getLine()Returns the line. -
getCol
public int getCol()Returns the col. -
isNextCharOnNewLine
public boolean isNextCharOnNewLine()Returns the nextCharOnNewLine.- Overrides:
isNextCharOnNewLinein classTokenizer- Returns:
- the nextCharOnNewLine
-
complainAboutNonAscii
- Throws:
SAXException
-
isAlreadyComplainedAboutNonAscii
public boolean isAlreadyComplainedAboutNonAscii()Returns the alreadyComplainedAboutNonAscii.- Overrides:
isAlreadyComplainedAboutNonAsciiin classTokenizer- Returns:
- the alreadyComplainedAboutNonAscii
-
flushChars
Flushes coalesced character tokens.- Overrides:
flushCharsin classTokenizer- Parameters:
buf- TODOpos- TODO- Throws:
SAXException
-
checkChar
- Overrides:
checkCharin classTokenizer- Throws:
SAXException
-
transition
- Overrides:
transitionin classTokenizer- Throws:
SAXException- See Also:
-
toUPlusString
-
warnAboutPrivateUseChar
Emits a warning about private use characters if the warning has not been emitted yet.- Throws:
SAXException
-
isPrivateUse
private boolean isPrivateUse(char c) Tells if the argument is a BMP PUA character.- Parameters:
c- the UTF-16 code unit to check- Returns:
trueif PUA character
-
isAstralPrivateUse
private boolean isAstralPrivateUse(int c) Tells if the argument is an astral PUA character.- Parameters:
c- the code point to check- Returns:
trueif astral private use
-
errGarbageAfterLtSlash
- Overrides:
errGarbageAfterLtSlashin classTokenizer- Throws:
SAXException
-
errLtSlashGt
- Overrides:
errLtSlashGtin classTokenizer- Throws:
SAXException
-
errWarnLtSlashInRcdata
- Overrides:
errWarnLtSlashInRcdatain classTokenizer- Throws:
SAXException
-
errHtml4LtSlashInRcdata
- Overrides:
errHtml4LtSlashInRcdatain classTokenizer- Throws:
SAXException
-
errCharRefLacksSemicolon
- Overrides:
errCharRefLacksSemicolonin classTokenizer- Throws:
SAXException
-
errNoDigitsInNCR
- Overrides:
errNoDigitsInNCRin classTokenizer- Throws:
SAXException
-
errGtInSystemId
- Overrides:
errGtInSystemIdin classTokenizer- Throws:
SAXException
-
errGtInPublicId
- Overrides:
errGtInPublicIdin classTokenizer- Throws:
SAXException
-
errNamelessDoctype
- Overrides:
errNamelessDoctypein classTokenizer- Throws:
SAXException
-
errConsecutiveHyphens
- Overrides:
errConsecutiveHyphensin classTokenizer- Throws:
SAXException
-
errPrematureEndOfComment
- Overrides:
errPrematureEndOfCommentin classTokenizer- Throws:
SAXException
-
errBogusComment
- Overrides:
errBogusCommentin classTokenizer- Throws:
SAXException
-
errUnquotedAttributeValOrNull
- Overrides:
errUnquotedAttributeValOrNullin classTokenizer- Throws:
SAXException
-
errSlashNotFollowedByGt
- Overrides:
errSlashNotFollowedByGtin classTokenizer- Throws:
SAXException
-
errHtml4XmlVoidSyntax
- Overrides:
errHtml4XmlVoidSyntaxin classTokenizer- Throws:
SAXException
-
errNoSpaceBetweenAttributes
- Overrides:
errNoSpaceBetweenAttributesin classTokenizer- Throws:
SAXException
-
errHtml4NonNameInUnquotedAttribute
- Overrides:
errHtml4NonNameInUnquotedAttributein classTokenizer- Throws:
SAXException
-
errLtOrEqualsOrGraveInUnquotedAttributeOrNull
- Overrides:
errLtOrEqualsOrGraveInUnquotedAttributeOrNullin classTokenizer- Throws:
SAXException
-
errAttributeValueMissing
- Overrides:
errAttributeValueMissingin classTokenizer- Throws:
SAXException
-
errBadCharBeforeAttributeNameOrNull
- Overrides:
errBadCharBeforeAttributeNameOrNullin classTokenizer- Throws:
SAXException
-
errEqualsSignBeforeAttributeName
- Overrides:
errEqualsSignBeforeAttributeNamein classTokenizer- Throws:
SAXException
-
errBadCharAfterLt
- Overrides:
errBadCharAfterLtin classTokenizer- Throws:
SAXException
-
errLtGt
- Overrides:
errLtGtin classTokenizer- Throws:
SAXException
-
errProcessingInstruction
- Overrides:
errProcessingInstructionin classTokenizer- Throws:
SAXException
-
errUnescapedAmpersandInterpretedAsCharacterReference
- Overrides:
errUnescapedAmpersandInterpretedAsCharacterReferencein classTokenizer- Throws:
SAXException
-
errNotSemicolonTerminated
- Overrides:
errNotSemicolonTerminatedin classTokenizer- Throws:
SAXException
-
errNoNamedCharacterMatch
- Overrides:
errNoNamedCharacterMatchin classTokenizer- Throws:
SAXException
-
errQuoteBeforeAttributeName
- Overrides:
errQuoteBeforeAttributeNamein classTokenizer- Throws:
SAXException
-
errQuoteOrLtInAttributeNameOrNull
- Overrides:
errQuoteOrLtInAttributeNameOrNullin classTokenizer- Throws:
SAXException
-
errExpectedPublicId
- Overrides:
errExpectedPublicIdin classTokenizer- Throws:
SAXException
-
errBogusDoctype
- Overrides:
errBogusDoctypein classTokenizer- Throws:
SAXException
-
maybeWarnPrivateUseAstral
- Overrides:
maybeWarnPrivateUseAstralin classTokenizer- Throws:
SAXException
-
maybeWarnPrivateUse
- Overrides:
maybeWarnPrivateUsein classTokenizer- Throws:
SAXException
-
maybeErrAttributesOnEndTag
- Overrides:
maybeErrAttributesOnEndTagin classTokenizer- Throws:
SAXException
-
maybeErrSlashInEndTag
- Overrides:
maybeErrSlashInEndTagin classTokenizer- Throws:
SAXException
-
errNcrNonCharacter
- Overrides:
errNcrNonCharacterin classTokenizer- Throws:
SAXException
-
errAstralNonCharacter
- Overrides:
errAstralNonCharacterin classTokenizer- Throws:
SAXException- See Also:
-
errNcrSurrogate
- Overrides:
errNcrSurrogatein classTokenizer- Throws:
SAXException
-
errNcrControlChar
- Overrides:
errNcrControlCharin classTokenizer- Throws:
SAXException
-
errNcrCr
- Overrides:
errNcrCrin classTokenizer- Throws:
SAXException
-
errNcrInC1Range
- Overrides:
errNcrInC1Rangein classTokenizer- Throws:
SAXException
-
errEofInPublicId
- Overrides:
errEofInPublicIdin classTokenizer- Throws:
SAXException
-
errEofInComment
- Overrides:
errEofInCommentin classTokenizer- Throws:
SAXException
-
errEofInDoctype
- Overrides:
errEofInDoctypein classTokenizer- Throws:
SAXException
-
errEofInAttributeValue
- Overrides:
errEofInAttributeValuein classTokenizer- Throws:
SAXException
-
errEofInAttributeName
- Overrides:
errEofInAttributeNamein classTokenizer- Throws:
SAXException
-
errEofWithoutGt
- Overrides:
errEofWithoutGtin classTokenizer- Throws:
SAXException
-
errEofInTagName
- Overrides:
errEofInTagNamein classTokenizer- Throws:
SAXException
-
errEofInEndTag
- Overrides:
errEofInEndTagin classTokenizer- Throws:
SAXException
-
errEofAfterLt
- Overrides:
errEofAfterLtin classTokenizer- Throws:
SAXException
-
errNcrOutOfRange
- Overrides:
errNcrOutOfRangein classTokenizer- Throws:
SAXException
-
errNcrUnassigned
- Overrides:
errNcrUnassignedin classTokenizer- Throws:
SAXException
-
errDuplicateAttribute
- Overrides:
errDuplicateAttributein classTokenizer- Throws:
SAXException
-
errEofInSystemId
- Overrides:
errEofInSystemIdin classTokenizer- Throws:
SAXException
-
errExpectedSystemId
- Overrides:
errExpectedSystemIdin classTokenizer- Throws:
SAXException
-
errMissingSpaceBeforeDoctypeName
- Overrides:
errMissingSpaceBeforeDoctypeNamein classTokenizer- Throws:
SAXException
-
errHyphenHyphenBang
- Overrides:
errHyphenHyphenBangin classTokenizer- Throws:
SAXException
-
errNcrControlChar
- Overrides:
errNcrControlCharin classTokenizer- Throws:
SAXException
-
errNcrZero
- Overrides:
errNcrZeroin classTokenizer- Throws:
SAXException
-
errNoSpaceBetweenDoctypeSystemKeywordAndQuote
- Overrides:
errNoSpaceBetweenDoctypeSystemKeywordAndQuotein classTokenizer- Throws:
SAXException
-
errNoSpaceBetweenPublicAndSystemIds
- Overrides:
errNoSpaceBetweenPublicAndSystemIdsin classTokenizer- Throws:
SAXException
-
errNoSpaceBetweenDoctypePublicKeywordAndQuote
- Overrides:
errNoSpaceBetweenDoctypePublicKeywordAndQuotein classTokenizer- Throws:
SAXException
-
noteAttributeWithoutValue
- Overrides:
noteAttributeWithoutValuein classTokenizer- Throws:
SAXException
-
noteUnquotedAttributeValue
- Overrides:
noteUnquotedAttributeValuein classTokenizer- Throws:
SAXException
-
setTransitionHandler
Sets the transitionHandler.- Parameters:
transitionHandler- the transitionHandler to set
-
setTransitionBaseOffset
public void setTransitionBaseOffset(int offset) Sets an offset to be added to the position reported toTransitionHandler.- Overrides:
setTransitionBaseOffsetin classTokenizer- Parameters:
offset- the offset
-