Class HTMLScanner.ContentScanner
java.lang.Object
org.htmlunit.cyberneko.HTMLScanner.ContentScanner
- All Implemented Interfaces:
HTMLScanner.Scanner
- Enclosing class:
HTMLScanner
The primary HTML document scanner.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final XMLAttributesImplAttributes.private final QNameA qualified name. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate booleanchangeEncoding(String charset) Tries to change the encoding used to read the input stream to the specified oneprotected StringnextContent(int len) Reads the next characters WITHOUT impacting the buffer content up to current offset.private StringremoveSpaces(String content) Removes all spaces for the string (remember: JDK 1.3!)booleanscan(boolean complete) Scan.protected booleanscanAttribute(XMLAttributesImpl attributes, boolean[] empty) Scans a real attribute.protected voidscanAttributeQuotedValue(int currentQuote, HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes) protected voidscanAttributeUnquotedValue(HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue) protected voidprotected booleanscanCDataContent(XMLString xmlString) protected voidprotected voidprotected booleanscanCommentContent(XMLString buffer) protected voidprotected voidscanPI()private voidprotected StringscanStartElement(boolean[] empty) Scans a start element.private voidscanUntilEndTag(String tagName) Scans the content of : it doesn't get parsed but is considered as plain text when featureHTMLScanner.PARSE_NOSCRIPT_CONTENTis set to false.
-
Field Details
-
qName_
A qualified name. -
attributes_
Attributes.
-
-
Constructor Details
-
ContentScanner
public ContentScanner()
-
-
Method Details
-
scan
Scan.- Specified by:
scanin interfaceHTMLScanner.Scanner- Parameters:
complete- True if the scanner should not return until scanning is complete.- Returns:
- True if additional scanning is required.
- Throws:
IOException- Thrown if I/O error occurs.
-
scanUntilEndTag
Scans the content of- Parameters:
tagName- the tag for which content is scanned (one of "noscript", "noframes", "iframe")- Throws:
IOException- on error
-
scanScriptContent
- Throws:
IOException
-
nextContent
Reads the next characters WITHOUT impacting the buffer content up to current offset.- Parameters:
len- the number of characters to read- Returns:
- the read string (length may be smaller if EOF is encountered)
- Throws:
IOException- in case of io problems
-
scanCharacters
- Throws:
IOException
-
scanCDATA
- Throws:
IOException
-
scanComment
- Throws:
IOException
-
scanCommentContent
- Throws:
IOException
-
scanCDataContent
- Throws:
IOException
-
scanPI
- Throws:
IOException
-
scanStartElement
Scans a start element.- Parameters:
empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Returns:
- ename
- Throws:
IOException- in case of io problems
-
removeSpaces
-
changeEncoding
Tries to change the encoding used to read the input stream to the specified one- Parameters:
charset- the charset that should be used- Returns:
truewhen the encoding has been changed
-
scanAttribute
Scans a real attribute.- Parameters:
attributes- The list of attributes.empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Returns:
- success
- Throws:
IOException- in case of io problems
-
scanAttributeUnquotedValue
protected void scanAttributeUnquotedValue(HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue) throws IOException - Throws:
IOException
-
scanAttributeQuotedValue
protected void scanAttributeQuotedValue(int currentQuote, HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes) throws IOException - Throws:
IOException
-
scanEndElement
- Throws:
IOException
-