Package com.hankcs.algorithm
Class AhoCorasickDoubleArrayTrie<V>
- java.lang.Object
-
- com.hankcs.algorithm.AhoCorasickDoubleArrayTrie<V>
-
- All Implemented Interfaces:
java.io.Serializable
public class AhoCorasickDoubleArrayTrie<V> extends java.lang.Object implements java.io.SerializableAn implementation of Aho Corasick algorithm based on Double Array Trie- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private classAhoCorasickDoubleArrayTrie.BuilderA builder to build the AhoCorasickDoubleArrayTriestatic classAhoCorasickDoubleArrayTrie.Hit<V>A result outputstatic interfaceAhoCorasickDoubleArrayTrie.IHit<V>Processor handles the output when hit a keywordstatic interfaceAhoCorasickDoubleArrayTrie.IHitCancellable<V>Callback that allows to cancel the search process.static interfaceAhoCorasickDoubleArrayTrie.IHitFull<V>Processor handles the output when hit a keyword, with more detail
-
Field Summary
Fields Modifier and Type Field Description protected int[]basebase array of the Double Array Trie structureprotected int[]checkcheck array of the Double Array Trie structureprotected int[]failfail table of the Aho Corasick automataprotected int[]lthe length of every keyprotected int[][]outputoutput table of the Aho Corasick automataprotected intsizethe size of base and check arrayprotected V[]vouter value array
-
Constructor Summary
Constructors Constructor Description AhoCorasickDoubleArrayTrie()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidbuild(java.util.Map<java.lang.String,V> map)Build a AhoCorasickDoubleArrayTrie from a mapprivate intexactMatchSearch(char[] keyChars, int pos, int len, int nodePos)match exactly by a keyintexactMatchSearch(java.lang.String key)match exactly by a keyprivate intexactMatchSearch(java.lang.String key, int pos, int len, int nodePos)match exactly by a keyAhoCorasickDoubleArrayTrie.Hit<V>findFirst(java.lang.String text)Search first match in stringVget(int index)Pick the value by index in value array
Notice that to be more efficiently, this method DO NOT check the parameterVget(java.lang.String key)Get value by a String key, just like a map.get() methodprivate intgetState(int currentState, char character)transmit state, supports failure functionvoidload(java.io.ObjectInputStream in)Loadbooleanmatches(java.lang.String text)Checks that string contains at least one substringvoidparseText(char[] text, AhoCorasickDoubleArrayTrie.IHit<V> processor)Parse textvoidparseText(char[] text, AhoCorasickDoubleArrayTrie.IHitFull<V> processor)Parse textjava.util.List<AhoCorasickDoubleArrayTrie.Hit<V>>parseText(java.lang.CharSequence text)Parse textvoidparseText(java.lang.CharSequence text, AhoCorasickDoubleArrayTrie.IHit<V> processor)Parse textvoidparseText(java.lang.CharSequence text, AhoCorasickDoubleArrayTrie.IHitCancellable<V> processor)Parse textvoidsave(java.io.ObjectOutputStream out)Savebooleanset(java.lang.String key, V value)Update a value corresponding to a keyintsize()Get the size of the keywordsprivate voidstoreEmits(int position, int currentState, java.util.List<AhoCorasickDoubleArrayTrie.Hit<V>> collectedEmits)store outputprotected inttransition(int current, char c)transition of a stateprotected inttransitionWithRoot(int nodePos, char c)transition of a state, if the state is root and it failed, then returns the root
-
-
-
Field Detail
-
check
protected int[] check
check array of the Double Array Trie structure
-
base
protected int[] base
base array of the Double Array Trie structure
-
fail
protected int[] fail
fail table of the Aho Corasick automata
-
output
protected int[][] output
output table of the Aho Corasick automata
-
v
protected V[] v
outer value array
-
l
protected int[] l
the length of every key
-
size
protected int size
the size of base and check array
-
-
Method Detail
-
parseText
public java.util.List<AhoCorasickDoubleArrayTrie.Hit<V>> parseText(java.lang.CharSequence text)
Parse text- Parameters:
text- The text- Returns:
- a list of outputs
-
parseText
public void parseText(java.lang.CharSequence text, AhoCorasickDoubleArrayTrie.IHit<V> processor)Parse text- Parameters:
text- The textprocessor- A processor which handles the output
-
parseText
public void parseText(java.lang.CharSequence text, AhoCorasickDoubleArrayTrie.IHitCancellable<V> processor)Parse text- Parameters:
text- The textprocessor- A processor which handles the output
-
parseText
public void parseText(char[] text, AhoCorasickDoubleArrayTrie.IHit<V> processor)Parse text- Parameters:
text- The textprocessor- A processor which handles the output
-
parseText
public void parseText(char[] text, AhoCorasickDoubleArrayTrie.IHitFull<V> processor)Parse text- Parameters:
text- The textprocessor- A processor which handles the output
-
matches
public boolean matches(java.lang.String text)
Checks that string contains at least one substring- Parameters:
text- source text to check- Returns:
trueif string contains at least one substring
-
findFirst
public AhoCorasickDoubleArrayTrie.Hit<V> findFirst(java.lang.String text)
Search first match in string- Parameters:
text- source text to check- Returns:
- first match or
nullif there are no matches
-
save
public void save(java.io.ObjectOutputStream out) throws java.io.IOExceptionSave- Parameters:
out- An ObjectOutputStream object- Throws:
java.io.IOException- Some IOException
-
load
public void load(java.io.ObjectInputStream in) throws java.io.IOException, java.lang.ClassNotFoundExceptionLoad- Parameters:
in- An ObjectInputStream object- Throws:
java.io.IOExceptionjava.lang.ClassNotFoundException
-
get
public V get(java.lang.String key)
Get value by a String key, just like a map.get() method- Parameters:
key- The key- Returns:
-
set
public boolean set(java.lang.String key, V value)Update a value corresponding to a key- Parameters:
key- the keyvalue- the value- Returns:
- successful or not(failure if there is no key)
-
get
public V get(int index)
Pick the value by index in value array
Notice that to be more efficiently, this method DO NOT check the parameter- Parameters:
index- The index- Returns:
- The value
-
getState
private int getState(int currentState, char character)transmit state, supports failure function- Parameters:
currentState-character-- Returns:
-
storeEmits
private void storeEmits(int position, int currentState, java.util.List<AhoCorasickDoubleArrayTrie.Hit<V>> collectedEmits)store output- Parameters:
position-currentState-collectedEmits-
-
transition
protected int transition(int current, char c)transition of a state- Parameters:
current-c-- Returns:
-
transitionWithRoot
protected int transitionWithRoot(int nodePos, char c)transition of a state, if the state is root and it failed, then returns the root- Parameters:
nodePos-c-- Returns:
-
build
public void build(java.util.Map<java.lang.String,V> map)
Build a AhoCorasickDoubleArrayTrie from a map- Parameters:
map- a map containing key-value pairs
-
exactMatchSearch
public int exactMatchSearch(java.lang.String key)
match exactly by a key- Parameters:
key- the key- Returns:
- the index of the key, you can use it as a perfect hash function
-
exactMatchSearch
private int exactMatchSearch(java.lang.String key, int pos, int len, int nodePos)match exactly by a key- Parameters:
key-pos-len-nodePos-- Returns:
-
exactMatchSearch
private int exactMatchSearch(char[] keyChars, int pos, int len, int nodePos)match exactly by a key- Parameters:
keyChars- the char array of the keypos- the begin index of char arraylen- the length of the keynodePos- the starting position of the node for searching- Returns:
- the value index of the key, minus indicates null
-
size
public int size()
Get the size of the keywords- Returns:
-
-