Package org.htmlcleaner
Class TagNode
java.lang.Object
org.htmlcleaner.BaseTokenImpl
org.htmlcleaner.BaseHtmlNode
org.htmlcleaner.TagToken
org.htmlcleaner.TagNode
- Direct Known Subclasses:
ProxyTagNode,Serializer.HeadlessTagNode
XML node tag - basic node of the cleaned HTML tree. At the same time, it represents start tag token after HTML parsing phase and before cleaning phase. After cleaning process, tree structure remains containing tag nodes (TagNode class), content (text nodes - ContentNode), comments (CommentNode) and optionally doctype node (DoctypeToken).
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final LinkedHashMap<String, String> private booleanUsed to indicate a start tag that was auto generated becauseTagInfo.isContinueAfter(String)(closedTag.getName()) returned true For example,private DoctypeTokenprivate booleanThis flag is set if foreignMarkup is set; if it is false it means that the tagnode tree has not been built and so it isn't known whether this node is a HTML node or foreign markup such as SVG.private final booleanIndicates that the node is a copy of another node.private booleanThis flag is set if we are using namespace aware setting, and the tagnode belongs to a non-HTML namespace.private booleanprivate booleanThis flag is set if attribute values should be trimmed.private booleanIndicates that the node was marked to be pruned out of the tree.Fields inherited from class org.htmlcleaner.BaseHtmlNode
parent -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidaddAttribute(String attName, String attValue) Adds specified attribute to this tag or overrides existing one.voidvoidaddChildren(List newChildren) Add all elements from specified list to this node.(package private) voidaddItemForMoving(Object item) voidaddNamespaceDeclaration(String nsPrefix, String nsURI) Adds namespace declaration to the nodeReturns a copy of the set of attributes for this node with lowercase names.(package private) voidcollectNamespacePrefixesOnPath(Set<String> prefixes) Collect all prefixes in namespace declarations up the path to the document root from the specified nodeObject[]evaluateXPath(String xPathExpression) Evaluates XPath expression on give node.private TagNodefindElement(ITagNodeCondition condition, boolean isRecursive) Finds first element in the tree that satisfy specified condition.findElementByAttValue(String attName, String attValue, boolean isRecursive, boolean isCaseSensitive) findElementByName(String findName, boolean isRecursive) findElementHavingAttribute(String attName, boolean isRecursive) findMatchingTagNodes(ITagNodeCondition condition, boolean isRecursive) Get all elements in the tree that satisfy specified condition.TagNode[]getAllElements(boolean isRecursive) getAllElementsList(boolean isRecursive) getAttributeByName(String attName) Returns the attributes of the tagnode.Returns the attributes of the tagnode in lower case.intgetChildIndex(HtmlNode child) Deprecated.TagNode[]getElementList(ITagNodeCondition condition, boolean isRecursive) Get all elements in the tree that satisfy specified condition.getElementListByAttValue(String attName, String attValue, boolean isRecursive, boolean isCaseSensitive) getElementListByName(String findName, boolean isRecursive) getElementListHavingAttribute(String attName, boolean isRecursive) private TagNode[]getElements(ITagNodeCondition condition, boolean isRecursive) TagNode[]getElementsByAttValue(String attName, String attValue, boolean isRecursive, boolean isCaseSensitive) TagNode[]getElementsByName(String findName, boolean isRecursive) TagNode[]getElementsHavingAttribute(String attName, boolean isRecursive) getName()(package private) StringgetNamespaceURIOnPath(String nsPrefix) getText()private voidCalled whenver the thread is interrupted.booleanhasAttribute(String attName) Checks existence of specified attribute.booleanvoidinsertChild(int index, HtmlNode childToAdd) Inserts specified node at specified position in array of childrenvoidinsertChildAfter(HtmlNode node, HtmlNode nodeToInsert) Inserts specified node in the list of children after specified childvoidinsertChildBefore(HtmlNode node, HtmlNode nodeToInsert) Inserts specified node in the list of children before specified childbooleanbooleanisCopy()booleanisEmpty()boolean(package private) booleanisFormed()booleanisPruned()booleanmakeCopy()voidRemoves all children (subelements and text content).voidremoveAttribute(String attName) Removes specified attribute from this tag.booleanremoveChild(Object child) Remove specified child element from this node.booleanRemove this node from the tree.private voidreplaceAttributes(Map<String, String> attributes) Clears existing attributes and puts replacement attributesvoidserialize(Serializer serializer, Writer writer) voidsetAttributes(Map<String, String> attributes) Replace the current set of attributes with a new set.voidsetAutoGenerated(boolean autoGenerated) voidsetChildren(List<? extends BaseToken> children) voidsetDocType(DoctypeToken docType) voidsetForeignMarkup(boolean isForeignMarkup) (package private) void(package private) voidsetFormed(boolean isFormed) (package private) voidsetItemsToMove(List<BaseToken> itemsToMove) voidsetPruned(boolean pruned) voidsetTrimAttributeValues(boolean isTrimAttributeValues) voidtraverse(TagNodeVisitor visitor) Traverses the tree and performs visitor's action on each node.private booleantraverseInternally(TagNodeVisitor visitor) Methods inherited from class org.htmlcleaner.BaseHtmlNode
getParent, getSiblings, setParentMethods inherited from class org.htmlcleaner.BaseTokenImpl
getCol, getRow, setCol, setRowMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.htmlcleaner.HtmlNode
getParent, getSiblings, setParent
-
Field Details
-
attributes
-
children
-
docType
-
itemsToMove
-
nsDeclarations
-
isFormed
private transient boolean isFormed -
autoGenerated
private boolean autoGeneratedUsed to indicate a start tag that was auto generated becauseTagInfo.isContinueAfter(String)(closedTag.getName()) returned true For example,foobar
would result in a new being created resulting infoobar
The second opening tag is marked as autogenerated. This allows the autogenerated tag to be removed if it is unneeded. -
isForeignMarkup
private boolean isForeignMarkupThis flag is set if we are using namespace aware setting, and the tagnode belongs to a non-HTML namespace. -
foreignMarkupFlagSet
private boolean foreignMarkupFlagSetThis flag is set if foreignMarkup is set; if it is false it means that the tagnode tree has not been built and so it isn't known whether this node is a HTML node or foreign markup such as SVG. -
isTrimAttributeValues
private boolean isTrimAttributeValuesThis flag is set if attribute values should be trimmed. -
pruned
private boolean prunedIndicates that the node was marked to be pruned out of the tree. -
isCopy
private final boolean isCopyIndicates that the node is a copy of another node.- See Also:
-
-
Constructor Details
-
TagNode
-
TagNode
-
-
Method Details
-
getName
-
getAttributeByName
- Parameters:
attName-- Returns:
- Value of the specified attribute, or null if it this tag doesn't contain it.
-
getAttributes
Returns the attributes of the tagnode.- Returns:
- Map instance containing all attribute name/value pairs.
-
getAttributesInLowerCase
Returns the attributes of the tagnode in lower case.- Returns:
- Map instance containing all attribute name/value pairs, with attribute names transformed to lower case
-
setAttributes
Replace the current set of attributes with a new set.- Parameters:
attributes-
-
replaceAttributes
Clears existing attributes and puts replacement attributes- Parameters:
attributes- the attributes to set
-
hasAttribute
Checks existence of specified attribute.- Parameters:
attName-- Returns:
- true if TagNode has attribute
-
addAttribute
Adds specified attribute to this tag or overrides existing one.- Specified by:
addAttributein classTagToken- Parameters:
attName-attValue-
-
removeAttribute
Removes specified attribute from this tag.- Parameters:
attName-
-
getChildren
Deprecated.usegetChildTagList(), will be refactored and possibly removed in future versions. TODO This method should be refactored because is does not properly match the commonly used Java's getter/setter strategy.- Returns:
- List of child TagNode objects.
-
setChildren
-
getAllChildren
-
getChildTagList
- Returns:
- List of child TagNode objects.
-
hasChildren
public boolean hasChildren()- Returns:
- Whether this node has child elements or not.
-
getChildTags
- Returns:
- An array of child TagNode instances.
-
getText
- Returns:
- Text content of this node and it's subelements.
-
getChildIndex
- Parameters:
child- Child to find index of- Returns:
- Index of the specified child node inside this node's children, -1 if node is not the child
-
insertChild
Inserts specified node at specified position in array of children- Parameters:
index-childToAdd-
-
insertChildBefore
Inserts specified node in the list of children before specified child- Parameters:
node- Child before which to insert new nodenodeToInsert- Node to be inserted at specified position
-
insertChildAfter
Inserts specified node in the list of children after specified child- Parameters:
node- Child after which to insert new nodenodeToInsert- Node to be inserted at specified position
-
getDocType
-
setDocType
-
addChild
-
addChildren
Add all elements from specified list to this node.- Parameters:
newChildren-
-
findElement
Finds first element in the tree that satisfy specified condition.- Parameters:
condition-isRecursive-- Returns:
- First TagNode found, or null if no such elements.
-
findMatchingTagNodes
Get all elements in the tree that satisfy specified condition.- Parameters:
condition-isRecursive-- Returns:
- List of TagNode instances.
-
getElementList
Get all elements in the tree that satisfy specified condition.- Parameters:
condition-isRecursive-- Returns:
- List of TagNode instances with specified name.
-
getElements
- Parameters:
condition-isRecursive-- Returns:
- The array of all subelements that satisfy specified condition.
-
getAllElementsList
-
getAllElements
-
findElementByName
-
getElementListByName
-
getElementsByName
-
findElementHavingAttribute
-
getElementListHavingAttribute
-
getElementsHavingAttribute
-
findElementByAttValue
-
getElementListByAttValue
-
getElementsByAttValue
-
evaluateXPath
Evaluates XPath expression on give node.
This is not fully supported XPath parser and evaluator. Examples below show supported elements:- //div//a
- //div//a[@id][@class]
- /body/*[1]/@type
- //div[3]//a[@id][@href='r/n4']
- //div[last() >= 4]//./div[position() = last()])[position() > 22]//li[2]//a
- //div[2]/@*[2]
- data(//div//a[@id][@class])
- //p/last()
- //body//div[3][@class]//span[12.2invalid input: '<'position()]/@id
- data(//a['v' invalid input: '<' @id])
- Parameters:
xPathExpression-- Returns:
- result of XPather evaluation.
- Throws:
XPatherException
-
removeFromTree
public boolean removeFromTree()Remove this node from the tree.- Returns:
- True if element is removed (if it is not root node).
-
removeChild
Remove specified child element from this node.- Parameters:
child-- Returns:
- True if child object existed in the children list.
-
removeAllChildren
public void removeAllChildren()Removes all children (subelements and text content). -
addItemForMoving
-
getItemsToMove
-
setItemsToMove
-
isFormed
boolean isFormed() -
setFormed
void setFormed(boolean isFormed) -
setFormed
void setFormed() -
setAutoGenerated
public void setAutoGenerated(boolean autoGenerated) - Parameters:
autoGenerated- the autoGenerated to set
-
isAutoGenerated
public boolean isAutoGenerated()- Returns:
- the autoGenerated
-
isPruned
public boolean isPruned()- Returns:
- true, if node was marked to be pruned.
-
setPruned
public void setPruned(boolean pruned) -
isEmpty
public boolean isEmpty() -
addNamespaceDeclaration
Adds namespace declaration to the node- Parameters:
nsPrefix- Namespace prefixnsURI- Namespace URI
-
collectNamespacePrefixesOnPath
Collect all prefixes in namespace declarations up the path to the document root from the specified node- Parameters:
prefixes- Set of prefixes to be collected
-
getNamespaceURIOnPath
-
getNamespaceDeclarations
- Returns:
- Map of namespace declarations for this node
-
serialize
- Specified by:
serializein interfaceBaseToken- Overrides:
serializein classBaseHtmlNode- Throws:
IOException
-
makeCopy
-
isCopy
public boolean isCopy() -
traverse
Traverses the tree and performs visitor's action on each node. It stops when it finishes all the tree or when visitor returns false.- Parameters:
visitor- TagNodeVisitor implementation
-
traverseInternally
-
isForeignMarkup
public boolean isForeignMarkup()- Returns:
- the isForeignMarkup
-
setForeignMarkup
public void setForeignMarkup(boolean isForeignMarkup) - Parameters:
isForeignMarkup- the isForeignMarkup to set
-
isTrimAttributeValues
public boolean isTrimAttributeValues()- Returns:
- the isTrimAttributeValues
-
setTrimAttributeValues
public void setTrimAttributeValues(boolean isTrimAttributeValues) - Parameters:
isTrimAttributeValues- the isTrimAttributeValues to set
-
attributesToLowerCase
Returns a copy of the set of attributes for this node with lowercase names. Where there are duplicate attributes (e.g. class, CLASS) the first value is retained.- Returns:
- a map of attributes in key/value pairs with names in lowercase
-
handleInterruption
private void handleInterruption()Called whenver the thread is interrupted. Currently this is a placeholder, but could hold cleanup methods and user interaction
-
getChildTagList(), will be refactored and possibly removed in future versions.