- Type Parameters:
V- graph vertex typeE- graph edge type
The algorithm is originally described in Zhang, Kaizhong & Shasha, Dennis. (1989). Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SIAM J. Comput.. 18. 1245-1262. 10.1137/0218082.
The time complexity of the algorithm is $O(|T_1|\cdot|T_2|\cdot min(depth(T_1),leaves(T_1)) \cdot min(depth(T_2),leaves(T_2)))$. Space complexity is $O(|T_1|\cdot |T_2|)$, where $|T_1|$ and $|T_2|$ denote number of vertices in trees $T_1$ and $T_2$ correspondingly, $leaves()$ function returns number of leaf vertices in a tree.
The tree edit distance problem is defined in a following way. Consider $2$ trees $T_1$ and $T_2$ with root vertices $r_1$ and $r_2$ correspondingly. For those trees there are 3 elementary modification actions:
- Remove a vertex $v$ from $T_1$.
- Insert a vertex $v$ into $T_2$.
- Change vertex $v_1$ in $T_1$ to vertex $v_2$ in $T_2$.
The algorithm is based on a dynamic programming principle and assigns a label to each vertex in the trees which is equal to its index in post-oder traversal. It also uses a notion of a keyroot which is defined as a vertex in a tree which has a left sibling. Additionally a special $l()$ function is introduced with returns for every vertex the index of its leftmost child wrt the post-order traversal in the tree.
Solving the tree edit problem distance is divided into computing edit distance for every pair of subtrees rooted at vertices $v_1$ and $v_2$ where $v_1$ is a keyroot in the first tree and $v_2$ is a keyroot in the second tree.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate classAuxiliary class which is used intreeDistance()function to store intermediate edit operations during dynamic programming computation.static classRepresents elementary action which changes the structure of a tree.static enumType of an edit operation.private classAuxiliary class which for computes keyroot vertices, tree ordering and $l()$ function for a particular tree. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate booleanHelper field which indicates whether the algorithm has already been executed fortree1andtree2.private ToDoubleBiFunction<V, V> Function which computes cost of changing a vertex $v1$ intree1to vertex $v2$ intree2.private List<List<List<ZhangShashaTreeEditDistance.EditOperation<V>>>> Array with lists of edit operations which transform subtrees oftree1into subtreestree2.private ToDoubleFunction<V> Function which computes cost of inserting a particular vertex intotree2.private ToDoubleFunction<V> Function which computes cost of removing a particular vertex from {2code tree1}.private VRoot vertex of thetree1.private VRoot vertex of thetree2.First tree for which the distance is computed by this algorithm.Second tree for which the distance is computed by this algorithm.private double[][]Array with edit distances between subtrees oftree1andtree2. -
Constructor Summary
ConstructorsConstructorDescriptionConstructs an instance of the algorithm for the giventree1,root1,tree2androot2.ZhangShashaTreeEditDistance(Graph<V, E> tree1, V root1, Graph<V, E> tree2, V root2, ToDoubleFunction<V> insertCost, ToDoubleFunction<V> removeCost, ToDoubleBiFunction<V, V> changeCost) Constructs an instance of the algorithm for the giventree1,root1,tree2,root2,insertCost,removeCostandchangeCost. -
Method Summary
Modifier and TypeMethodDescriptiondoubleComputes edit distance fortree1andtree2.Computes a list of edit operations which transformtree1intotree2.private voidPerforms lazy computations of this algorithm and stores cached data intreeDistancesandeditOperationList.private List<ZhangShashaTreeEditDistance.EditOperation<V>> restoreOperationsList(List<List<ZhangShashaTreeEditDistance<V, E>.CacheEntry>> cachedOperations, int i, int j) Restores list of edit operations which have been cached incachedOperationsduring the edit distance computation.private voidtreeDistance(int i, int j, ZhangShashaTreeEditDistance<V, E>.TreeOrdering ordering1, ZhangShashaTreeEditDistance<V, E>.TreeOrdering ordering2) Computes edit distance and list of edit operations for vertex $v1$ fromtree1which has tree ordering index equal to $i$ and vertex $v2$ fromtree2which has tree ordering index equal to $j$.
-
Field Details
-
tree1
First tree for which the distance is computed by this algorithm. -
root1
Root vertex of thetree1. -
tree2
Second tree for which the distance is computed by this algorithm. -
root2
Root vertex of thetree2. -
insertCost
Function which computes cost of inserting a particular vertex intotree2. -
removeCost
Function which computes cost of removing a particular vertex from {2code tree1}. -
changeCost
Function which computes cost of changing a vertex $v1$ intree1to vertex $v2$ intree2. -
treeDistances
private double[][] treeDistancesArray with edit distances between subtrees oftree1andtree2. Formally, $treeDistances[i][j]$ stores edit distance between subtree oftree1rooted at vertex $i+1$ and subtree oftree2rooted at vertex $j+1$, where $i$ and $j$ are vertex indices from the corresponding tree orderings. -
editOperationLists
Array with lists of edit operations which transform subtrees oftree1into subtreestree2. Formally, editOperationLists[i][j]$ stores a list of edit operations which transform subtreetree1rooted at vertex $i$ into subtree oftree2rooted at vertex $j$, where $i$ and $j$ are vertex indices from the corresponding tree orderings. -
algorithmExecuted
private boolean algorithmExecutedHelper field which indicates whether the algorithm has already been executed fortree1andtree2.
-
-
Constructor Details
-
ZhangShashaTreeEditDistance
Constructs an instance of the algorithm for the giventree1,root1,tree2androot2. This constructor sets following default values for the distance functions. TheinsertCostandremoveCostalways return $1.0$, thechangeCostreturn $0.0$ if vertices are equal and1.0otherwise.- Parameters:
tree1- a treeroot1- root vertex oftree1tree2- a treeroot2- root vertex oftree2
-
ZhangShashaTreeEditDistance
public ZhangShashaTreeEditDistance(Graph<V, E> tree1, V root1, Graph<V, E> tree2, V root2, ToDoubleFunction<V> insertCost, ToDoubleFunction<V> removeCost, ToDoubleBiFunction<V, V> changeCost) Constructs an instance of the algorithm for the giventree1,root1,tree2,root2,insertCost,removeCostandchangeCost.- Parameters:
tree1- a treeroot1- root vertex oftree1tree2- a treeroot2- root vertex oftree2insertCost- cost function for inserting a node intotree1removeCost- cost function for removing a node fromtree2changeCost- cost function of changing a node intree1to a node intree2
-
-
Method Details
-
getDistance
public double getDistance()Computes edit distance fortree1andtree2.- Returns:
- edit distance between
tree1andtree2
-
getEditOperationLists
Computes a list of edit operations which transformtree1intotree2.- Returns:
- list of edit operations
-
lazyRunAlgorithm
private void lazyRunAlgorithm()Performs lazy computations of this algorithm and stores cached data intreeDistancesandeditOperationList. -
treeDistance
private void treeDistance(int i, int j, ZhangShashaTreeEditDistance<V, E>.TreeOrdering ordering1, ZhangShashaTreeEditDistance<V, E>.TreeOrdering ordering2) Computes edit distance and list of edit operations for vertex $v1$ fromtree1which has tree ordering index equal to $i$ and vertex $v2$ fromtree2which has tree ordering index equal to $j$. Both $v1$ and $v2$ must be keyroots in the corresponding trees.- Parameters:
i- ordering index of a keyroot intree1j- ordering index of a keywoot intree2ordering1- ordering oftree1ordering2- ordering oftree2
-
restoreOperationsList
private List<ZhangShashaTreeEditDistance.EditOperation<V>> restoreOperationsList(List<List<ZhangShashaTreeEditDistance<V, E>.CacheEntry>> cachedOperations, int i, int j) Restores list of edit operations which have been cached incachedOperationsduring the edit distance computation. Starting from a cache entry at index $(i,j)$.- Parameters:
cachedOperations- 2-dimensional list with cached operationsi- starting operation indexj- starting operation index- Returns:
- list of edit operations
-