Class SimpleXMLParser


  • public class SimpleXMLParser
    extends Object
    A simple XML and HTML parser. This parser is, like the SAX parser, an event based parser, but with much less functionality.

    The parser can:

    • It recognizes the encoding used
    • It recognizes all the elements' start tags and end tags
    • It lists attributes, where attribute values can be enclosed in single or double quotes
    • It recognizes the <[CDATA[ ... ]]> construct
    • It recognizes the standard entities: &amp;, &lt;, &gt;, &quot;, and &apos;, as well as numeric entities
    • It maps lines ending in \r\n and \r to \n on input, in accordance with the XML Specification, Section 2.11

    The code is based on http://www.javaworld.com/javaworld/javatips/javatip128/ with some extra code from XERCES to recognize the encoding.

    • Method Detail

      • parse

        public static void parse​(SimpleXMLDocHandler doc,
                                 InputStream in)
                          throws IOException
        Parses the XML document firing the events to the handler.
        Parameters:
        doc - the document handler
        in - the document. The encoding is deduced from the stream. The stream is not closed
        Throws:
        IOException - on error
      • getJavaEncoding

        public static String getJavaEncoding​(String iana)
        Gets the java encoding from the IANA encoding. If the encoding cannot be found it returns the input.
        Parameters:
        iana - the IANA encoding
        Returns:
        the java encoding
      • escapeXML

        public static String escapeXML​(String s,
                                       boolean onlyASCII)
        Escapes a string with the appropriated XML codes.
        Parameters:
        s - the string to be escaped
        onlyASCII - codes above 127 will always be escaped with &#nn; if true
        Returns:
        the escaped string
      • decodeEntity

        public static char decodeEntity​(String s)