Package groovy.xml

Class XmlParser

java.lang.Object
groovy.xml.XmlParser
All Implemented Interfaces:
ContentHandler

public class XmlParser extends Object implements ContentHandler
A helper class for parsing XML into a tree of Node instances for a simple way of processing XML. This parser does not preserve the XML InfoSet - if that's what you need try using W3C DOM, dom4j, JDOM, XOM etc. This parser ignores comments and processing instructions and converts the XML into a Node for each element in the XML with attributes and child Nodes and Strings. This simple model is sufficient for most simple use cases of processing XML. Parsing is eager: each parse operation consumes the SAX event stream and builds a complete Node tree before returning.

Example usage:

 import groovy.xml.XmlParser
 def xml = '<root><one a1="uno!"/><two>Some text!</two></root>'
 def rootNode = new XmlParser().parseText(xml)
 assert rootNode.name() == 'root'
 assert rootNode.one[0].@a1 == 'uno!'
 assert rootNode.two.text() == 'Some text!'
 rootNode.children().each { assert it.name() in ['one','two'] }
 
  • Constructor Details

    • XmlParser

      public XmlParser() throws ParserConfigurationException, SAXException
      Creates a non-validating and namespace-aware XmlParser which does not allow DOCTYPE declarations in documents.

      Parser options can be configured via setters before the first parse call:

       // Using Groovy named parameters:
       def parser = new XmlParser(namespaceAware: false, trimWhitespace: true)
       
      Throws:
      ParserConfigurationException - if no parser which satisfies the requested configuration can be created.
      SAXException - for SAX errors.
    • XmlParser

      public XmlParser(boolean validating, boolean namespaceAware) throws ParserConfigurationException, SAXException
      Creates a XmlParser which does not allow DOCTYPE declarations in documents.
      Parameters:
      validating - true if the parser should validate documents as they are parsed; false otherwise.
      namespaceAware - true if the parser should provide support for XML namespaces; false otherwise.
      Throws:
      ParserConfigurationException - if no parser which satisfies the requested configuration can be created.
      SAXException - for SAX errors.
    • XmlParser

      public XmlParser(boolean validating, boolean namespaceAware, boolean allowDocTypeDeclaration) throws ParserConfigurationException, SAXException
      Creates a XmlParser.
      Parameters:
      validating - true if the parser should validate documents as they are parsed; false otherwise.
      namespaceAware - true if the parser should provide support for XML namespaces; false otherwise.
      allowDocTypeDeclaration - true if the parser should provide support for DOCTYPE declarations; false otherwise.
      Throws:
      ParserConfigurationException - if no parser which satisfies the requested configuration can be created.
      SAXException - for SAX errors.
    • XmlParser

      public XmlParser(XMLReader reader)
      Creates a parser backed by the supplied SAX reader.
      Parameters:
      reader - the XML reader whose features, properties, and handlers will be used
    • XmlParser

      public XmlParser(SAXParser parser) throws SAXException
      Creates a parser backed by the supplied SAX parser.
      Parameters:
      parser - the SAX parser providing the XMLReader used for parsing
      Throws:
      SAXException - if the parser cannot provide an XML reader
  • Method Details

    • isTrimWhitespace

      public boolean isTrimWhitespace()
      Returns the current trim whitespace setting.
      Returns:
      true if whitespace will be trimmed
    • setTrimWhitespace

      public void setTrimWhitespace(boolean trimWhitespace)
      Sets the trim whitespace setting value.
      Parameters:
      trimWhitespace - the desired setting value
    • isKeepIgnorableWhitespace

      public boolean isKeepIgnorableWhitespace()
      Returns the current keep ignorable whitespace setting.
      Returns:
      true if ignorable whitespace will be kept (default false)
    • setKeepIgnorableWhitespace

      public void setKeepIgnorableWhitespace(boolean keepIgnorableWhitespace)
      Sets the keep ignorable whitespace setting value.
      Parameters:
      keepIgnorableWhitespace - the desired new value
    • parse

      public Node parse(File file) throws IOException, SAXException
      Parses the content of the given file as XML turning it into a tree of Nodes.
      Parameters:
      file - the File containing the XML to be parsed
      Returns:
      the root node of the parsed tree of Nodes
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parse

      public Node parse(Path path) throws IOException, SAXException
      Parses the content of the file at the given path as XML turning it into a tree of Nodes.
      Parameters:
      path - the path of the File containing the XML to be parsed
      Returns:
      the root node of the parsed tree of Nodes
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parse

      public Node parse(InputSource input) throws IOException, SAXException
      Parse the content of the specified input source into a tree of Nodes.
      Parameters:
      input - the InputSource for the XML to parse
      Returns:
      the root node of the parsed tree of Nodes
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parse

      public Node parse(InputStream input) throws IOException, SAXException
      Parse the content of the specified input stream into a tree of Nodes.

      Note that using this method will not provide the parser with any URI for which to find DTDs etc

      Parameters:
      input - an InputStream containing the XML to be parsed
      Returns:
      the root node of the parsed tree of Nodes
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parse

      public Node parse(Reader in) throws IOException, SAXException
      Parse the content of the specified reader into a tree of Nodes.

      Note that using this method will not provide the parser with any URI for which to find DTDs etc

      Parameters:
      in - a Reader to read the XML to be parsed
      Returns:
      the root node of the parsed tree of Nodes
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parse

      public Node parse(String uri) throws IOException, SAXException
      Parse the content of the specified URI into a tree of Nodes.
      Parameters:
      uri - a String containing a URI pointing to the XML to be parsed
      Returns:
      the root node of the parsed tree of Nodes
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parseText

      public Node parseText(String text) throws IOException, SAXException
      A helper method to parse the given text as XML.
      Parameters:
      text - the XML text to parse
      Returns:
      the root node of the parsed tree of Nodes
      Throws:
      SAXException - Any SAX exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the application.
    • parseTextAs

      public <T> T parseTextAs(Class<T> type, String text)
      Parse the content of the specified XML text into a typed object. Requires jackson-databind on the classpath for type conversion. Supports @JsonProperty and @JsonFormat annotations.
      Type Parameters:
      T - the target type
      Parameters:
      type - the target type
      text - the XML text to parse
      Returns:
      a typed object
      Throws:
      XmlRuntimeException - if parsing or conversion fails, or jackson-databind is absent
      Since:
      6.0.0
    • parseAs

      public <T> T parseAs(Class<T> type, Reader reader)
      Parse XML from a reader into a typed object. Requires jackson-databind on the classpath for type conversion.
      Type Parameters:
      T - the target type
      Parameters:
      type - the target type
      reader - the reader of XML
      Returns:
      a typed object
      Throws:
      XmlRuntimeException - if parsing or conversion fails, or jackson-databind is absent
      Since:
      6.0.0
    • parseAs

      public <T> T parseAs(Class<T> type, InputStream stream)
      Parse XML from an input stream into a typed object. Requires jackson-databind on the classpath for type conversion.
      Type Parameters:
      T - the target type
      Parameters:
      type - the target type
      stream - the input stream of XML
      Returns:
      a typed object
      Throws:
      XmlRuntimeException - if parsing or conversion fails, or jackson-databind is absent
      Since:
      6.0.0
    • parseAs

      public <T> T parseAs(Class<T> type, File file) throws IOException
      Parse XML from a file into a typed object. Requires jackson-databind on the classpath for type conversion.
      Type Parameters:
      T - the target type
      Parameters:
      type - the target type
      file - the XML file
      Returns:
      a typed object
      Throws:
      IOException - if the file cannot be read
      XmlRuntimeException - if parsing or conversion fails, or jackson-databind is absent
      Since:
      6.0.0
    • parseAs

      public <T> T parseAs(Class<T> type, Path path) throws IOException
      Parse XML from a path into a typed object. Requires jackson-databind on the classpath for type conversion.
      Type Parameters:
      T - the target type
      Parameters:
      type - the target type
      path - the path to the XML file
      Returns:
      a typed object
      Throws:
      IOException - if the file cannot be read
      XmlRuntimeException - if parsing or conversion fails, or jackson-databind is absent
      Since:
      6.0.0
    • isNamespaceAware

      public boolean isNamespaceAware()
      Determine if namespace handling is enabled.
      Returns:
      true if namespace handling is enabled
    • setNamespaceAware

      public void setNamespaceAware(boolean namespaceAware)
      Enable and/or disable namespace handling. Must be set before the first parse call.
      Parameters:
      namespaceAware - the new desired value
      Throws:
      IllegalStateException - if called after parsing has started
    • isValidating

      public boolean isValidating()
      Determine if the parser validates documents.
      Returns:
      true if validation is enabled
      Since:
      6.0.0
    • setValidating

      public void setValidating(boolean validating)
      Enable and/or disable validation. Must be set before the first parse call.
      Parameters:
      validating - the new desired value
      Throws:
      IllegalStateException - if called after parsing has started
      Since:
      6.0.0
    • isAllowDocTypeDeclaration

      public boolean isAllowDocTypeDeclaration()
      Determine if DOCTYPE declarations are allowed.
      Returns:
      true if DOCTYPE declarations are allowed
      Since:
      6.0.0
    • setAllowDocTypeDeclaration

      public void setAllowDocTypeDeclaration(boolean allowDocTypeDeclaration)
      Enable and/or disable DOCTYPE declaration support. Must be set before the first parse call.
      Parameters:
      allowDocTypeDeclaration - the new desired value
      Throws:
      IllegalStateException - if called after parsing has started
      Since:
      6.0.0
    • getDTDHandler

      public DTDHandler getDTDHandler()
      Returns the SAX DTD handler configured on the underlying reader.
      Returns:
      the configured DTD handler, or null if none has been set
    • getEntityResolver

      public EntityResolver getEntityResolver()
      Returns the SAX entity resolver configured on the underlying reader.
      Returns:
      the configured entity resolver, or null if none has been set
    • getErrorHandler

      public ErrorHandler getErrorHandler()
      Returns the SAX error handler configured on the underlying reader.
      Returns:
      the configured error handler, or null if none has been set
    • getFeature

      public boolean getFeature(String uri) throws SAXNotRecognizedException, SAXNotSupportedException
      Looks up a SAX feature on the underlying reader.
      Parameters:
      uri - the fully qualified SAX feature URI
      Returns:
      true if the feature is enabled
      Throws:
      SAXNotRecognizedException - if the feature name is not recognized
      SAXNotSupportedException - if the feature is recognized but not supported
    • getProperty

      Looks up a SAX property on the underlying reader.
      Parameters:
      uri - the fully qualified SAX property URI
      Returns:
      the current value of the property
      Throws:
      SAXNotRecognizedException - if the property name is not recognized
      SAXNotSupportedException - if the property is recognized but not supported
    • setDTDHandler

      public void setDTDHandler(DTDHandler dtdHandler)
      Sets the SAX DTD handler on the underlying reader.
      Parameters:
      dtdHandler - the DTD handler to receive notation and unparsed entity callbacks
    • setEntityResolver

      public void setEntityResolver(EntityResolver entityResolver)
      Sets the SAX entity resolver on the underlying reader.
      Parameters:
      entityResolver - the resolver to use for external entities
    • setErrorHandler

      public void setErrorHandler(ErrorHandler errorHandler)
      Sets the SAX error handler on the underlying reader.
      Parameters:
      errorHandler - the handler to receive parser warnings and errors
    • setFeature

      public void setFeature(String uri, boolean value) throws SAXNotRecognizedException, SAXNotSupportedException
      Enables or disables a SAX feature on the underlying reader.
      Parameters:
      uri - the fully qualified SAX feature URI
      value - the value to apply
      Throws:
      SAXNotRecognizedException - if the feature name is not recognized
      SAXNotSupportedException - if the feature is recognized but not supported
    • setProperty

      public void setProperty(String uri, Object value) throws SAXNotRecognizedException, SAXNotSupportedException
      Sets a SAX property on the underlying reader.
      Parameters:
      uri - the fully qualified SAX property URI
      value - the value to apply
      Throws:
      SAXNotRecognizedException - if the property name is not recognized
      SAXNotSupportedException - if the property is recognized but not supported
    • startDocument

      public void startDocument() throws SAXException
      Resets the current root node before SAX events for a new document begin.
      Specified by:
      startDocument in interface ContentHandler
      Throws:
      SAXException - if the SAX pipeline reports an error
    • endDocument

      public void endDocument() throws SAXException
      Completes the current parse and clears the internal element stack.
      Specified by:
      endDocument in interface ContentHandler
      Throws:
      SAXException - if the SAX pipeline reports an error
    • startElement

      public void startElement(String namespaceURI, String localName, String qName, Attributes list) throws SAXException
      Creates a new Node for the current element and pushes it onto the parse stack.
      Specified by:
      startElement in interface ContentHandler
      Parameters:
      namespaceURI - the namespace URI, or an empty string if namespaces are unavailable
      localName - the local element name
      qName - the qualified element name as reported by SAX
      list - the element attributes
      Throws:
      SAXException - if node creation fails
    • endElement

      public void endElement(String namespaceURI, String localName, String qName) throws SAXException
      Flushes buffered text and pops the current element when its end tag is seen.
      Specified by:
      endElement in interface ContentHandler
      Parameters:
      namespaceURI - the namespace URI, or an empty string if namespaces are unavailable
      localName - the local element name
      qName - the qualified element name as reported by SAX
      Throws:
      SAXException - if text handling fails
    • characters

      public void characters(char[] buffer, int start, int length) throws SAXException
      Buffers character data until the enclosing element boundary is reached.
      Specified by:
      characters in interface ContentHandler
      Parameters:
      buffer - the character buffer supplied by SAX
      start - the start offset in the buffer
      length - the number of characters to read
      Throws:
      SAXException - if the SAX pipeline reports an error
    • startPrefixMapping

      public void startPrefixMapping(String prefix, String namespaceURI) throws SAXException
      Receives namespace prefix mapping notifications. The default implementation does not retain separate prefix state.
      Specified by:
      startPrefixMapping in interface ContentHandler
      Parameters:
      prefix - the declared prefix
      namespaceURI - the namespace URI bound to the prefix
      Throws:
      SAXException - if the SAX pipeline reports an error
    • endPrefixMapping

      public void endPrefixMapping(String prefix) throws SAXException
      Receives namespace prefix scope end notifications. The default implementation performs no action.
      Specified by:
      endPrefixMapping in interface ContentHandler
      Parameters:
      prefix - the prefix leaving scope
      Throws:
      SAXException - if the SAX pipeline reports an error
    • ignorableWhitespace

      public void ignorableWhitespace(char[] buffer, int start, int len) throws SAXException
      Receives ignorable whitespace and optionally preserves it as text content.
      Specified by:
      ignorableWhitespace in interface ContentHandler
      Parameters:
      buffer - the character buffer supplied by SAX
      start - the start offset in the buffer
      len - the number of characters to read
      Throws:
      SAXException - if the SAX pipeline reports an error
    • processingInstruction

      public void processingInstruction(String target, String data) throws SAXException
      Receives processing instruction callbacks. The default implementation ignores processing instructions.
      Specified by:
      processingInstruction in interface ContentHandler
      Parameters:
      target - the processing instruction target
      data - the processing instruction data
      Throws:
      SAXException - if the SAX pipeline reports an error
    • getDocumentLocator

      public Locator getDocumentLocator()
      Returns the document locator last provided by SAX.
      Returns:
      the current locator, or null if parsing has not started
    • setDocumentLocator

      public void setDocumentLocator(Locator locator)
      Stores the locator supplied by SAX for later diagnostics or subclass use.
      Specified by:
      setDocumentLocator in interface ContentHandler
      Parameters:
      locator - the document locator for the current parse
    • skippedEntity

      public void skippedEntity(String name) throws SAXException
      Receives skipped entity notifications. The default implementation performs no action.
      Specified by:
      skippedEntity in interface ContentHandler
      Parameters:
      name - the skipped entity name
      Throws:
      SAXException - if the SAX pipeline reports an error
    • getXMLReader

      protected XMLReader getXMLReader()
      Returns the configured XML reader after registering this parser as its content handler. Subclasses may override to customize reader preparation before parsing begins.
      Returns:
      the XML reader used for subsequent parse operations
    • addTextToNode

      protected void addTextToNode()
      Transfers buffered character data into the current node when an element boundary is reached. Subclasses may override to customize text normalization or whitespace preservation during parsing.
    • createNode

      protected Node createNode(Node parent, Object name, Map attributes)
      Creates a new node with the given parent, name, and attributes. The default implementation returns an instance of groovy.util.Node.
      Parameters:
      parent - the parent node, or null if the node being created is the root node
      name - an Object representing the name of the node (typically an instance of QName)
      attributes - a Map of attribute names to attribute values
      Returns:
      a new Node instance representing the current node
    • getElementName

      protected Object getElementName(String namespaceURI, String localName, String qName)
      Return a name given the namespaceURI, localName and qName.
      Parameters:
      namespaceURI - the namespace URI
      localName - the local name
      qName - the qualified name
      Returns:
      the newly created representation of the name