Class SimpleXMLParser
- java.lang.Object
-
- com.gitlab.pdftk_java.com.lowagie.text.pdf.SimpleXMLParser
-
public class SimpleXMLParser extends Object
A simple XML and HTML parser. This parser is, like the SAX parser, an event based parser, but with much less functionality.The parser can:
- It recognizes the encoding used
- It recognizes all the elements' start tags and end tags
- It lists attributes, where attribute values can be enclosed in single or double quotes
- It recognizes the
<[CDATA[ ... ]]>construct - It recognizes the standard entities: &, <, >, ", and ', as well as numeric entities
- It maps lines ending in
\r\nand\rto\non input, in accordance with the XML Specification, Section 2.11
The code is based on http://www.javaworld.com/javaworld/javatips/javatip128/ with some extra code from XERCES to recognize the encoding.
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static chardecodeEntity(String s)static StringescapeXML(String s, boolean onlyASCII)Escapes a string with the appropriated XML codes.static StringgetJavaEncoding(String iana)Gets the java encoding from the IANA encoding.static voidparse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, Reader r, boolean html)Parses the XML document firing the events to the handler.static voidparse(SimpleXMLDocHandler doc, InputStream in)Parses the XML document firing the events to the handler.static voidparse(SimpleXMLDocHandler doc, Reader r)
-
-
-
Method Detail
-
parse
public static void parse(SimpleXMLDocHandler doc, InputStream in) throws IOException
Parses the XML document firing the events to the handler.- Parameters:
doc- the document handlerin- the document. The encoding is deduced from the stream. The stream is not closed- Throws:
IOException- on error
-
getJavaEncoding
public static String getJavaEncoding(String iana)
Gets the java encoding from the IANA encoding. If the encoding cannot be found it returns the input.- Parameters:
iana- the IANA encoding- Returns:
- the java encoding
-
parse
public static void parse(SimpleXMLDocHandler doc, Reader r) throws IOException
- Throws:
IOException
-
parse
public static void parse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, Reader r, boolean html) throws IOException
Parses the XML document firing the events to the handler.- Parameters:
doc- the document handlerr- the document. The encoding is already resolved. The reader is not closed- Throws:
IOException- on error
-
escapeXML
public static String escapeXML(String s, boolean onlyASCII)
Escapes a string with the appropriated XML codes.- Parameters:
s- the string to be escapedonlyASCII- codes above 127 will always be escaped with &#nn; iftrue- Returns:
- the escaped string
-
decodeEntity
public static char decodeEntity(String s)
-
-