Package org.htmlunit.cyberneko
Class HTMLScanner.ContentScanner
- java.lang.Object
-
- org.htmlunit.cyberneko.HTMLScanner.ContentScanner
-
- All Implemented Interfaces:
HTMLScanner.Scanner
- Enclosing class:
- HTMLScanner
public class HTMLScanner.ContentScanner extends java.lang.Object implements HTMLScanner.Scanner
The primary HTML document scanner.
-
-
Field Summary
Fields Modifier and Type Field Description private XMLAttributesImplattributes_Attributes.private QNameqName_A qualified name.
-
Constructor Summary
Constructors Constructor Description ContentScanner()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private booleanchangeEncoding(java.lang.String charset)Tries to change the encoding used to read the input stream to the specified oneprotected java.lang.StringnextContent(int len)Reads the next characters WITHOUT impacting the buffer content up to current offset.private java.lang.StringremoveSpaces(java.lang.String content)Removes all spaces for the string (remember: JDK 1.3!)booleanscan(boolean complete)Scan.protected booleanscanAttribute(XMLAttributesImpl attributes, boolean[] empty)Scans a real attribute.protected voidscanAttributeQuotedValue(int currentQuote, HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes)protected voidscanAttributeUnquotedValue(HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue)protected voidscanCDATA()protected booleanscanCDataContent(XMLString xmlString)protected voidscanCharacters()protected voidscanComment()protected booleanscanCommentContent(XMLString buffer)protected voidscanEndElement()protected voidscanPI()private voidscanScriptContent()protected java.lang.StringscanStartElement(boolean[] empty)Scans a start element.private voidscanUntilEndTag(java.lang.String tagName)Scans the content of : it doesn't get parsed but is considered as plain text when featureHTMLScanner.PARSE_NOSCRIPT_CONTENTis set to false.
-
-
-
Field Detail
-
qName_
private final QName qName_
A qualified name.
-
attributes_
private final XMLAttributesImpl attributes_
Attributes.
-
-
Method Detail
-
scan
public boolean scan(boolean complete) throws java.io.IOExceptionScan.- Specified by:
scanin interfaceHTMLScanner.Scanner- Parameters:
complete- True if the scanner should not return until scanning is complete.- Returns:
- True if additional scanning is required.
- Throws:
java.io.IOException- Thrown if I/O error occurs.
-
scanUntilEndTag
private void scanUntilEndTag(java.lang.String tagName) throws java.io.IOExceptionScans the content of- Parameters:
tagName- the tag for which content is scanned (one of "noscript", "noframes", "iframe")- Throws:
java.io.IOException- on error
-
scanScriptContent
private void scanScriptContent() throws java.io.IOException- Throws:
java.io.IOException
-
nextContent
protected java.lang.String nextContent(int len) throws java.io.IOExceptionReads the next characters WITHOUT impacting the buffer content up to current offset.- Parameters:
len- the number of characters to read- Returns:
- the read string (length may be smaller if EOF is encountered)
- Throws:
java.io.IOException- in case of io problems
-
scanCharacters
protected void scanCharacters() throws java.io.IOException- Throws:
java.io.IOException
-
scanCDATA
protected void scanCDATA() throws java.io.IOException- Throws:
java.io.IOException
-
scanComment
protected void scanComment() throws java.io.IOException- Throws:
java.io.IOException
-
scanCommentContent
protected boolean scanCommentContent(XMLString buffer) throws java.io.IOException
- Throws:
java.io.IOException
-
scanCDataContent
protected boolean scanCDataContent(XMLString xmlString) throws java.io.IOException
- Throws:
java.io.IOException
-
scanPI
protected void scanPI() throws java.io.IOException- Throws:
java.io.IOException
-
scanStartElement
protected java.lang.String scanStartElement(boolean[] empty) throws java.io.IOExceptionScans a start element.- Parameters:
empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Returns:
- ename
- Throws:
java.io.IOException- in case of io problems
-
removeSpaces
private java.lang.String removeSpaces(java.lang.String content)
Removes all spaces for the string (remember: JDK 1.3!)
-
changeEncoding
private boolean changeEncoding(java.lang.String charset)
Tries to change the encoding used to read the input stream to the specified one- Parameters:
charset- the charset that should be used- Returns:
truewhen the encoding has been changed
-
scanAttribute
protected boolean scanAttribute(XMLAttributesImpl attributes, boolean[] empty) throws java.io.IOException
Scans a real attribute.- Parameters:
attributes- The list of attributes.empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Returns:
- success
- Throws:
java.io.IOException- in case of io problems
-
scanAttributeUnquotedValue
protected void scanAttributeUnquotedValue(HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue) throws java.io.IOException
- Throws:
java.io.IOException
-
scanAttributeQuotedValue
protected void scanAttributeQuotedValue(int currentQuote, HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes) throws java.io.IOException- Throws:
java.io.IOException
-
scanEndElement
protected void scanEndElement() throws java.io.IOException- Throws:
java.io.IOException
-
-