Class HTMLTagBalancer
java.lang.Object
org.cyberneko.html.HTMLTagBalancer
- All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponent, org.apache.xerces.xni.parser.XMLDocumentFilter, org.apache.xerces.xni.parser.XMLDocumentSource, org.apache.xerces.xni.XMLDocumentHandler, org.cyberneko.html.HTMLComponent
public class HTMLTagBalancer
extends Object
implements org.apache.xerces.xni.parser.XMLDocumentFilter, org.cyberneko.html.HTMLComponent
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) static classStructure to hold information about an element placed in buffer to be comsumed laterstatic classElement info for each start element.static classUnsynchronized stack of element information. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final StringInclude infoset augmentations.protected static final StringDocument fragment balancing only.protected static final StringDocument fragment balancing only (deprecated).private Listprotected static final StringError reporter.protected booleanInclude infoset augmentations.protected booleanDocument fragment balancing only.protected org.apache.xerces.xni.XMLDocumentHandlerThe document handler.protected org.apache.xerces.xni.parser.XMLDocumentSourceThe document source.protected final HTMLTagBalancer.InfoStackThe element stack.private final org.apache.xerces.xni.XMLAttributesEmpty attributes.protected org.cyberneko.html.HTMLErrorReporterError reporter.protected booleanIgnore outside content.private final org.cyberneko.html.HTMLAugmentationsAugmentations.protected final HTMLTagBalancer.InfoStackThe inline stack.protected shortModify HTML attribute names.protected shortModify HTML element names.protected booleanNamespaces.protected booleanTrue if a form is in the stack (allow to discard opening of nested forms)private booleanprivate booleanprivate final org.apache.xerces.xni.QNameA qualified name.static final StringEXPERIMENTAL: may change in next release
Name of the property holding the stack of elements in which context a document fragment should be parsed.private org.apache.xerces.xni.QName[]Stack of elements determining the context in which a document fragment should be parsedprivate intprotected booleanReport errors.protected booleanTrue if seen anything.protected booleanTrue if seen <body< element.protected booleanTrue if root element has been seen.protected booleanTrue if seen <head< element.protected booleanTrue if root element has been seen.protected booleanTrue if seen the end of the document element.protected static final StringIgnore outside content.private org.cyberneko.html.LostTextprotected static final StringModify HTML attribute names: { "upper", "lower", "default" }.protected static final StringModify HTML element names: { "upper", "lower", "default" }.protected static final shortLowercase HTML names.protected static final shortMatch HTML element names.protected static final shortDon't modify HTML names.protected static final shortUppercase HTML names.protected static final StringNamespaces.private static final String[]Recognized features.private static final Boolean[]Recognized features defaults.private static final String[]Recognized properties.private static final Object[]Recognized properties defaults.protected static final StringReport errors.protected static final org.cyberneko.html.HTMLEventInfoSynthesized event info item.protected org.cyberneko.html.HTMLTagBalancingListener -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected final voidcallEndElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs) Call document handler end element.protected final voidcallStartElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs) Call document handler start element.voidcharacters(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) Characters.voidcomment(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) Comment.private voidConsume elements that have been buffered, like that are first consumed at the end of documentprivate voidprivate org.apache.xerces.xni.QNamecreateQName(String tagName) voiddoctypeDecl(String rootElementName, String publicId, String systemId, org.apache.xerces.xni.Augmentations augs) Doctype declaration.protected final org.apache.xerces.xni.XMLAttributesReturns a set of empty attributes.voidemptyElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs) Empty element.voidendCDATA(org.apache.xerces.xni.Augmentations augs) End CDATA section.voidendDocument(org.apache.xerces.xni.Augmentations augs) End document.voidendElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs) End element.voidendGeneralEntity(String name, org.apache.xerces.xni.Augmentations augs) End entity.voidendPrefixMapping(String prefix, org.apache.xerces.xni.Augmentations augs) End prefix mapping.private voidGenerates a missing (which creates missing when needed)private booleanforceStartElement(org.apache.xerces.xni.QName elem, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs) Forces an element start, taking care to set the information to allow startElement to "see" that's the element has been forced.org.apache.xerces.xni.XMLDocumentHandlerReturns the document handler.org.apache.xerces.xni.parser.XMLDocumentSourceReturns the document source.protected HTMLElements.ElementgetElement(org.apache.xerces.xni.QName elementName) Returns an HTML element.protected final intgetElementDepth(HTMLElements.Element element) Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.getFeatureDefault(String featureId) Returns the default state for a feature.protected static final shortgetNamesValue(String value) Converts HTML names string value to constant value.protected intgetParentDepth(HTMLElements.Element[] parents, short bounds) Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.getPropertyDefault(String propertyId) Returns the default state for a property.String[]Returns recognized features.String[]Returns recognized properties.voidignorableWhitespace(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) Ignorable whitespace.protected static final StringmodifyName(String name, short mode) Modifies the given name based on the specified mode.private voidnotifyDiscardedEndElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs) Notifies the tagBalancingListener (if any) of an ignored end elementprivate voidnotifyDiscardedStartElement(org.apache.xerces.xni.QName elem, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs) Notifies the tagBalancingListener (if any) of an ignored start elementvoidprocessingInstruction(String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs) Processing instruction.voidreset(org.apache.xerces.xni.parser.XMLComponentManager manager) Resets the component.voidsetDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler) Sets the document handler.voidsetDocumentSource(org.apache.xerces.xni.parser.XMLDocumentSource source) Sets the document source.voidsetFeature(String featureId, boolean state) Sets a feature.voidsetProperty(String propertyId, Object value) Sets a property.(package private) voidsetTagBalancingListener(org.cyberneko.html.HTMLTagBalancingListener tagBalancingListener) voidstartCDATA(org.apache.xerces.xni.Augmentations augs) Start CDATA section.voidstartDocument(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.Augmentations augs) Start document.voidstartDocument(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.NamespaceContext nscontext, org.apache.xerces.xni.Augmentations augs) Start document.voidstartElement(org.apache.xerces.xni.QName elem, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs) Start element.voidstartGeneralEntity(String name, org.apache.xerces.xni.XMLResourceIdentifier id, String encoding, org.apache.xerces.xni.Augmentations augs) Start entity.voidstartPrefixMapping(String prefix, String uri, org.apache.xerces.xni.Augmentations augs) Start prefix mapping.protected final org.apache.xerces.xni.AugmentationsReturns an augmentations object with a synthesized item added.voidText declaration.voidxmlDecl(String version, String encoding, String standalone, org.apache.xerces.xni.Augmentations augs) XML declaration.
-
Field Details
-
NAMESPACES
-
AUGMENTATIONS
-
REPORT_ERRORS
-
DOCUMENT_FRAGMENT_DEPRECATED
Document fragment balancing only (deprecated).- See Also:
-
DOCUMENT_FRAGMENT
-
IGNORE_OUTSIDE_CONTENT
-
RECOGNIZED_FEATURES
Recognized features. -
RECOGNIZED_FEATURES_DEFAULTS
Recognized features defaults. -
NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.- See Also:
-
NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.- See Also:
-
ERROR_REPORTER
-
FRAGMENT_CONTEXT_STACK
EXPERIMENTAL: may change in next release
Name of the property holding the stack of elements in which context a document fragment should be parsed.- See Also:
-
RECOGNIZED_PROPERTIES
Recognized properties. -
RECOGNIZED_PROPERTIES_DEFAULTS
Recognized properties defaults. -
NAMES_NO_CHANGE
protected static final short NAMES_NO_CHANGEDon't modify HTML names.- See Also:
-
NAMES_MATCH
protected static final short NAMES_MATCHMatch HTML element names.- See Also:
-
NAMES_UPPERCASE
protected static final short NAMES_UPPERCASEUppercase HTML names.- See Also:
-
NAMES_LOWERCASE
protected static final short NAMES_LOWERCASELowercase HTML names.- See Also:
-
SYNTHESIZED_ITEM
protected static final org.cyberneko.html.HTMLEventInfo SYNTHESIZED_ITEMSynthesized event info item. -
fNamespaces
protected boolean fNamespacesNamespaces. -
fAugmentations
protected boolean fAugmentationsInclude infoset augmentations. -
fReportErrors
protected boolean fReportErrorsReport errors. -
fDocumentFragment
protected boolean fDocumentFragmentDocument fragment balancing only. -
fIgnoreOutsideContent
protected boolean fIgnoreOutsideContentIgnore outside content. -
fNamesElems
protected short fNamesElemsModify HTML element names. -
fNamesAttrs
protected short fNamesAttrsModify HTML attribute names. -
fErrorReporter
protected org.cyberneko.html.HTMLErrorReporter fErrorReporterError reporter. -
fDocumentSource
protected org.apache.xerces.xni.parser.XMLDocumentSource fDocumentSourceThe document source. -
fDocumentHandler
protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandlerThe document handler. -
fElementStack
The element stack. -
fInlineStack
The inline stack. -
fSeenAnything
protected boolean fSeenAnythingTrue if seen anything. Important for xml declaration. -
fSeenDoctype
protected boolean fSeenDoctypeTrue if root element has been seen. -
fSeenRootElement
protected boolean fSeenRootElementTrue if root element has been seen. -
fSeenRootElementEnd
protected boolean fSeenRootElementEndTrue if seen the end of the document element. In other words, this variable is set to false until the end </HTML> tag is seen (or synthesized). This is used to ensure that extraneous events after the end of the document element do not make the document stream ill-formed. -
fSeenHeadElement
protected boolean fSeenHeadElementTrue if seen <head< element. -
fSeenBodyElement
protected boolean fSeenBodyElementTrue if seen <body< element. -
fOpenedForm
protected boolean fOpenedFormTrue if a form is in the stack (allow to discard opening of nested forms) -
fQName
private final org.apache.xerces.xni.QName fQNameA qualified name. -
fEmptyAttrs
private final org.apache.xerces.xni.XMLAttributes fEmptyAttrsEmpty attributes. -
fInfosetAugs
private final org.cyberneko.html.HTMLAugmentations fInfosetAugsAugmentations. -
tagBalancingListener
protected org.cyberneko.html.HTMLTagBalancingListener tagBalancingListener -
lostText_
private org.cyberneko.html.LostText lostText_ -
forcedStartElement_
private boolean forcedStartElement_ -
forcedEndElement_
private boolean forcedEndElement_ -
fragmentContextStack_
private org.apache.xerces.xni.QName[] fragmentContextStack_Stack of elements determining the context in which a document fragment should be parsed -
fragmentContextStackSize_
private int fragmentContextStackSize_ -
endElementsBuffer_
-
-
Constructor Details
-
HTMLTagBalancer
public HTMLTagBalancer()
-
-
Method Details
-
getFeatureDefault
-
getPropertyDefault
-
getRecognizedFeatures
Returns recognized features.- Specified by:
getRecognizedFeaturesin interfaceorg.apache.xerces.xni.parser.XMLComponent
-
getRecognizedProperties
Returns recognized properties.- Specified by:
getRecognizedPropertiesin interfaceorg.apache.xerces.xni.parser.XMLComponent
-
reset
public void reset(org.apache.xerces.xni.parser.XMLComponentManager manager) throws org.apache.xerces.xni.parser.XMLConfigurationException Resets the component.- Specified by:
resetin interfaceorg.apache.xerces.xni.parser.XMLComponent- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
setFeature
public void setFeature(String featureId, boolean state) throws org.apache.xerces.xni.parser.XMLConfigurationException Sets a feature.- Specified by:
setFeaturein interfaceorg.apache.xerces.xni.parser.XMLComponent- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
setProperty
-
setDocumentHandler
public void setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler) Sets the document handler.- Specified by:
setDocumentHandlerin interfaceorg.apache.xerces.xni.parser.XMLDocumentSource
-
getDocumentHandler
public org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()Returns the document handler.- Specified by:
getDocumentHandlerin interfaceorg.apache.xerces.xni.parser.XMLDocumentSource
-
startDocument
public void startDocument(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.NamespaceContext nscontext, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start document.- Specified by:
startDocumentin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Throws:
org.apache.xerces.xni.XNIException
-
xmlDecl
-
doctypeDecl
public void doctypeDecl(String rootElementName, String publicId, String systemId, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Doctype declaration.- Specified by:
doctypeDeclin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Throws:
org.apache.xerces.xni.XNIException
-
endDocument
public void endDocument(org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException End document.- Specified by:
endDocumentin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Throws:
org.apache.xerces.xni.XNIException
-
consumeBufferedEndElements
private void consumeBufferedEndElements()Consume elements that have been buffered, like
-