Class SimpleXMLParser
- java.lang.Object
-
- com.itextpdf.text.xml.simpleparser.SimpleXMLParser
-
public final class SimpleXMLParser extends java.lang.ObjectA simple XML. This parser is, like the SAX parser, an event based parser, but with much less functionality.The parser can:
- It recognizes the encoding used
- It recognizes all the elements' start tags and end tags
- It lists attributes, where attribute values can be enclosed in single or double quotes
- It recognizes the
<[CDATA[ ... ]]>construct - It recognizes the standard entities: &, <, >, ", and ', as well as numeric entities
- It maps lines ending in
\r\nand\rto\non input, in accordance with the XML Specification, Section 2.11
-
-
Field Summary
Fields Modifier and Type Field Description private static intATTRIBUTE_EQUALprivate static intATTRIBUTE_KEYprivate static intATTRIBUTE_VALUEprivate java.lang.Stringattributekeythe attribute key.private java.util.HashMap<java.lang.String,java.lang.String>attributescurrent attributesprivate java.lang.Stringattributevaluethe attribute value.private static intCDATAprivate intcharacterThe current character.private intcolumnsthe column where the current character occursprivate SimpleXMLDocHandlerCommentcommentThe handler to which we are going to forward comments.private static intCOMMENTprivate SimpleXMLDocHandlerdocThe handler to which we are going to forward document contentprivate java.lang.StringBufferentitycurrent entity (whatever is encountered between & and ;)private static intENTITYprivate booleaneolwas the last character equivalent to a newline?private static intEXAMIN_TAGprivate booleanhtmlAre we parsing HTML?private static intIN_CLOSETAGprivate intlinesthe line we are currently readingprivate intnestedKeeps track of the number of tags that are open.private NewLineHandlernewLineHandlerprivate booleannowhiteA boolean indicating if the next character should be taken into account if it's a space character.private static intPIprivate intpreviousCharacterThe previous character.private static intQUOTEprivate intquoteCharacterthe quote character that was used to open the quote.private static intSINGLE_TAGprivate java.util.Stack<java.lang.Integer>stackthe state stackprivate intstatethe current stateprivate java.lang.Stringtagcurrent tagnameprivate static intTAG_ENCOUNTEREDprivate static intTAG_EXAMINEDprivate java.lang.StringBuffertextcurrent text (whatever is encountered between tags)private static intTEXTprivate static intUNKNOWNpossible states
-
Constructor Summary
Constructors Modifier Constructor Description privateSimpleXMLParser(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, boolean html)Creates a Simple XML parser object.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description private voiddoTag()Sets the name of the tag.static java.lang.StringescapeXML(java.lang.String s, boolean onlyASCII)Deprecated.moved toXMLUtil.escapeXML(String, boolean), left here for the sake of backwards compatibilityprivate voidflush()Flushes the text that is currently in the buffer.private static java.lang.StringgetDeclaredEncoding(java.lang.String decl)private voidgo(java.io.Reader r)Does the actual parsing.private voidinitTag()Initialized the tag name and attributes.static voidparse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, java.io.Reader r, boolean html)Parses the XML document firing the events to the handler.static voidparse(SimpleXMLDocHandler doc, java.io.InputStream in)Parses the XML document firing the events to the handler.static voidparse(SimpleXMLDocHandler doc, java.io.Reader r)private voidprocessTag(boolean start)processes the tag.private intrestoreState()Gets a state from the stackprivate voidsaveState(int s)Adds a state to the stack.private voidthrowException(java.lang.String s)Throws an exception
-
-
-
Field Detail
-
UNKNOWN
private static final int UNKNOWN
possible states- See Also:
- Constant Field Values
-
TEXT
private static final int TEXT
- See Also:
- Constant Field Values
-
TAG_ENCOUNTERED
private static final int TAG_ENCOUNTERED
- See Also:
- Constant Field Values
-
EXAMIN_TAG
private static final int EXAMIN_TAG
- See Also:
- Constant Field Values
-
TAG_EXAMINED
private static final int TAG_EXAMINED
- See Also:
- Constant Field Values
-
IN_CLOSETAG
private static final int IN_CLOSETAG
- See Also:
- Constant Field Values
-
SINGLE_TAG
private static final int SINGLE_TAG
- See Also:
- Constant Field Values
-
CDATA
private static final int CDATA
- See Also:
- Constant Field Values
-
COMMENT
private static final int COMMENT
- See Also:
- Constant Field Values
-
PI
private static final int PI
- See Also:
- Constant Field Values
-
ENTITY
private static final int ENTITY
- See Also:
- Constant Field Values
-
QUOTE
private static final int QUOTE
- See Also:
- Constant Field Values
-
ATTRIBUTE_KEY
private static final int ATTRIBUTE_KEY
- See Also:
- Constant Field Values
-
ATTRIBUTE_EQUAL
private static final int ATTRIBUTE_EQUAL
- See Also:
- Constant Field Values
-
ATTRIBUTE_VALUE
private static final int ATTRIBUTE_VALUE
- See Also:
- Constant Field Values
-
stack
private final java.util.Stack<java.lang.Integer> stack
the state stack
-
character
private int character
The current character.
-
previousCharacter
private int previousCharacter
The previous character.
-
lines
private int lines
the line we are currently reading
-
columns
private int columns
the column where the current character occurs
-
eol
private boolean eol
was the last character equivalent to a newline?
-
nowhite
private boolean nowhite
A boolean indicating if the next character should be taken into account if it's a space character. When nospace is false, the previous character wasn't whitespace.- Since:
- 2.1.5
-
state
private int state
the current state
-
html
private final boolean html
Are we parsing HTML?
-
text
private final java.lang.StringBuffer text
current text (whatever is encountered between tags)
-
entity
private final java.lang.StringBuffer entity
current entity (whatever is encountered between & and ;)
-
tag
private java.lang.String tag
current tagname
-
attributes
private java.util.HashMap<java.lang.String,java.lang.String> attributes
current attributes
-
doc
private final SimpleXMLDocHandler doc
The handler to which we are going to forward document content
-
comment
private final SimpleXMLDocHandlerComment comment
The handler to which we are going to forward comments.
-
nested
private int nested
Keeps track of the number of tags that are open.
-
quoteCharacter
private int quoteCharacter
the quote character that was used to open the quote.
-
attributekey
private java.lang.String attributekey
the attribute key.
-
attributevalue
private java.lang.String attributevalue
the attribute value.
-
newLineHandler
private NewLineHandler newLineHandler
-
-
Constructor Detail
-
SimpleXMLParser
private SimpleXMLParser(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, boolean html)
Creates a Simple XML parser object. Call go(BufferedReader) immediately after creation.
-
-
Method Detail
-
go
private void go(java.io.Reader r) throws java.io.IOExceptionDoes the actual parsing. Perform this immediately after creating the parser object.- Throws:
java.io.IOException
-
restoreState
private int restoreState()
Gets a state from the stack- Returns:
- the previous state
-
saveState
private void saveState(int s)
Adds a state to the stack.- Parameters:
s- a state to add to the stack
-
flush
private void flush()
Flushes the text that is currently in the buffer. The text can be ignored, added to the document as content or as comment,... depending on the current state.
-
initTag
private void initTag()
Initialized the tag name and attributes.
-
doTag
private void doTag()
Sets the name of the tag.
-
processTag
private void processTag(boolean start)
processes the tag.- Parameters:
start- if true we are dealing with a tag that has just been opened; if false we are closing a tag.
-
throwException
private void throwException(java.lang.String s) throws java.io.IOExceptionThrows an exception- Throws:
java.io.IOException
-
parse
public static void parse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, java.io.Reader r, boolean html) throws java.io.IOException
Parses the XML document firing the events to the handler.- Parameters:
doc- the document handlercomment- the comment handlerr- the document. The encoding is already resolved. The reader is not closedhtml-- Throws:
java.io.IOException- on error
-
parse
public static void parse(SimpleXMLDocHandler doc, java.io.InputStream in) throws java.io.IOException
Parses the XML document firing the events to the handler.- Parameters:
doc- the document handlerin- the document. The encoding is deduced from the stream. The stream is not closed- Throws:
java.io.IOException- on error
-
getDeclaredEncoding
private static java.lang.String getDeclaredEncoding(java.lang.String decl)
-
parse
public static void parse(SimpleXMLDocHandler doc, java.io.Reader r) throws java.io.IOException
- Parameters:
doc-r-- Throws:
java.io.IOException
-
escapeXML
@Deprecated public static java.lang.String escapeXML(java.lang.String s, boolean onlyASCII)Deprecated.moved toXMLUtil.escapeXML(String, boolean), left here for the sake of backwards compatibilityEscapes a string with the appropriated XML codes.- Parameters:
s- the string to be escapedonlyASCII- codes above 127 will always be escaped with &#nn; iftrue- Returns:
- the escaped string
-
-