- java.lang.Object
-
- org.jgrapht.alg.similarity.ZhangShashaTreeEditDistance<V,E>
-
- Type Parameters:
V- graph vertex typeE- graph edge type
public class ZhangShashaTreeEditDistance<V,E> extends java.lang.ObjectDynamic programming algorithm for computing edit distance between trees.The algorithm is originally described in Zhang, Kaizhong & Shasha, Dennis. (1989). Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SIAM J. Comput.. 18. 1245-1262. 10.1137/0218082.
The time complexity of the algorithm is $O(|T_1|\cdot|T_2|\cdot min(depth(T_1),leaves(T_1)) \cdot min(depth(T_2),leaves(T_2)))$. Space complexity is $O(|T_1|\cdot |T_2|)$, where $|T_1|$ and $|T_2|$ denote number of vertices in trees $T_1$ and $T_2$ correspondingly, $leaves()$ function returns number of leaf vertices in a tree.
The tree edit distance problem is defined in a following way. Consider $2$ trees $T_1$ and $T_2$ with root vertices $r_1$ and $r_2$ correspondingly. For those trees there are 3 elementary modification actions:
- Remove a vertex $v$ from $T_1$.
- Insert a vertex $v$ into $T_2$.
- Change vertex $v_1$ in $T_1$ to vertex $v_2$ in $T_2$.
The algorithm is based on a dynamic programming principle and assigns a label to each vertex in the trees which is equal to its index in post-oder traversal. It also uses a notion of a keyroot which is defined as a vertex in a tree which has a left sibling. Additionally a special $l()$ function is introduced with returns for every vertex the index of its leftmost child wrt the post-order traversal in the tree.
Solving the tree edit problem distance is divided into computing edit distance for every pair of subtrees rooted at vertices $v_1$ and $v_2$ where $v_1$ is a keyroot in the first tree and $v_2$ is a keyroot in the second tree.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private classZhangShashaTreeEditDistance.CacheEntryAuxiliary class which is used intreeDistance()function to store intermediate edit operations during dynamic programming computation.static classZhangShashaTreeEditDistance.EditOperation<V>Represents elementary action which changes the structure of a tree.static classZhangShashaTreeEditDistance.OperationTypeType of an edit operation.private classZhangShashaTreeEditDistance.TreeOrderingAuxiliary class which for computes keyroot vertices, tree ordering and $l()$ function for a particular tree.
-
Field Summary
Fields Modifier and Type Field Description private booleanalgorithmExecutedHelper field which indicates whether the algorithm has already been executed fortree1andtree2.private java.util.function.ToDoubleBiFunction<V,V>changeCostFunction which computes cost of changing a vertex $v1$ intree1to vertex $v2$ intree2.private java.util.List<java.util.List<java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>>>>editOperationListsArray with lists of edit operations which transform subtrees oftree1into subtreestree2.private java.util.function.ToDoubleFunction<V>insertCostFunction which computes cost of inserting a particular vertex intotree2.private java.util.function.ToDoubleFunction<V>removeCostFunction which computes cost of removing a particular vertex from {2code tree1}.private Vroot1Root vertex of thetree1.private Vroot2Root vertex of thetree2.private Graph<V,E>tree1First tree for which the distance is computed by this algorithm.private Graph<V,E>tree2Second tree for which the distance is computed by this algorithm.private double[][]treeDistancesArray with edit distances between subtrees oftree1andtree2.
-
Constructor Summary
Constructors Constructor Description ZhangShashaTreeEditDistance(Graph<V,E> tree1, V root1, Graph<V,E> tree2, V root2)Constructs an instance of the algorithm for the giventree1,root1,tree2androot2.ZhangShashaTreeEditDistance(Graph<V,E> tree1, V root1, Graph<V,E> tree2, V root2, java.util.function.ToDoubleFunction<V> insertCost, java.util.function.ToDoubleFunction<V> removeCost, java.util.function.ToDoubleBiFunction<V,V> changeCost)Constructs an instance of the algorithm for the giventree1,root1,tree2,root2,insertCost,removeCostandchangeCost.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description doublegetDistance()Computes edit distance fortree1andtree2.java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>>getEditOperationLists()Computes a list of edit operations which transformtree1intotree2.private voidlazyRunAlgorithm()Performs lazy computations of this algorithm and stores cached data intreeDistancesandeditOperationList.private java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>>restoreOperationsList(java.util.List<java.util.List<ZhangShashaTreeEditDistance.CacheEntry>> cachedOperations, int i, int j)Restores list of edit operations which have been cached incachedOperationsduring the edit distance computation.private voidtreeDistance(int i, int j, ZhangShashaTreeEditDistance.TreeOrdering ordering1, ZhangShashaTreeEditDistance.TreeOrdering ordering2)Computes edit distance and list of edit operations for vertex $v1$ fromtree1which has tree ordering index equal to $i$ and vertex $v2$ fromtree2which has tree ordering index equal to $j$.
-
-
-
Field Detail
-
root1
private V root1
Root vertex of thetree1.
-
root2
private V root2
Root vertex of thetree2.
-
insertCost
private java.util.function.ToDoubleFunction<V> insertCost
Function which computes cost of inserting a particular vertex intotree2.
-
removeCost
private java.util.function.ToDoubleFunction<V> removeCost
Function which computes cost of removing a particular vertex from {2code tree1}.
-
changeCost
private java.util.function.ToDoubleBiFunction<V,V> changeCost
Function which computes cost of changing a vertex $v1$ intree1to vertex $v2$ intree2.
-
treeDistances
private double[][] treeDistances
Array with edit distances between subtrees oftree1andtree2. Formally, $treeDistances[i][j]$ stores edit distance between subtree oftree1rooted at vertex $i+1$ and subtree oftree2rooted at vertex $j+1$, where $i$ and $j$ are vertex indices from the corresponding tree orderings.
-
editOperationLists
private java.util.List<java.util.List<java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>>>> editOperationLists
Array with lists of edit operations which transform subtrees oftree1into subtreestree2. Formally, editOperationLists[i][j]$ stores a list of edit operations which transform subtreetree1rooted at vertex $i$ into subtree oftree2rooted at vertex $j$, where $i$ and $j$ are vertex indices from the corresponding tree orderings.
-
algorithmExecuted
private boolean algorithmExecuted
Helper field which indicates whether the algorithm has already been executed fortree1andtree2.
-
-
Constructor Detail
-
ZhangShashaTreeEditDistance
public ZhangShashaTreeEditDistance(Graph<V,E> tree1, V root1, Graph<V,E> tree2, V root2)
Constructs an instance of the algorithm for the giventree1,root1,tree2androot2. This constructor sets following default values for the distance functions. TheinsertCostandremoveCostalways return $1.0$, thechangeCostreturn $0.0$ if vertices are equal and1.0otherwise.- Parameters:
tree1- a treeroot1- root vertex oftree1tree2- a treeroot2- root vertex oftree2
-
ZhangShashaTreeEditDistance
public ZhangShashaTreeEditDistance(Graph<V,E> tree1, V root1, Graph<V,E> tree2, V root2, java.util.function.ToDoubleFunction<V> insertCost, java.util.function.ToDoubleFunction<V> removeCost, java.util.function.ToDoubleBiFunction<V,V> changeCost)
Constructs an instance of the algorithm for the giventree1,root1,tree2,root2,insertCost,removeCostandchangeCost.- Parameters:
tree1- a treeroot1- root vertex oftree1tree2- a treeroot2- root vertex oftree2insertCost- cost function for inserting a node intotree1removeCost- cost function for removing a node fromtree2changeCost- cost function of changing a node intree1to a node intree2
-
-
Method Detail
-
getDistance
public double getDistance()
Computes edit distance fortree1andtree2.- Returns:
- edit distance between
tree1andtree2
-
getEditOperationLists
public java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>> getEditOperationLists()
Computes a list of edit operations which transformtree1intotree2.- Returns:
- list of edit operations
-
lazyRunAlgorithm
private void lazyRunAlgorithm()
Performs lazy computations of this algorithm and stores cached data intreeDistancesandeditOperationList.
-
treeDistance
private void treeDistance(int i, int j, ZhangShashaTreeEditDistance.TreeOrdering ordering1, ZhangShashaTreeEditDistance.TreeOrdering ordering2)Computes edit distance and list of edit operations for vertex $v1$ fromtree1which has tree ordering index equal to $i$ and vertex $v2$ fromtree2which has tree ordering index equal to $j$. Both $v1$ and $v2$ must be keyroots in the corresponding trees.- Parameters:
i- ordering index of a keyroot intree1j- ordering index of a keywoot intree2ordering1- ordering oftree1ordering2- ordering oftree2
-
restoreOperationsList
private java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>> restoreOperationsList(java.util.List<java.util.List<ZhangShashaTreeEditDistance.CacheEntry>> cachedOperations, int i, int j)
Restores list of edit operations which have been cached incachedOperationsduring the edit distance computation. Starting from a cache entry at index $(i,j)$.- Parameters:
cachedOperations- 2-dimensional list with cached operationsi- starting operation indexj- starting operation index- Returns:
- list of edit operations
-
-