Class CFSA2Serializer
java.lang.Object
morfologik.fsa.builders.CFSA2Serializer
- All Implemented Interfaces:
FSASerializer
Serializes in-memory
FSA graphs to CFSA2.
It is possible to serialize the automaton with numbers required for perfect
hashing. See withNumbers() method.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionSupported flags.private byte[]The most frequent labels for integrating with the flags field.private int[]Inverted index of labels to be integrated with flags field.private final Loggerprivate static final intNo-state id.private com.carrotsearch.hppc.IntIntHashMapA hash map of [state, right-language-count] pairs.private com.carrotsearch.hppc.IntIntHashMapA hash map of [state, offset] pairs.private final byte[]Scratch array for serializing vints.private booleantrueif we should serialize with numbers. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate int[]computeFirstStates(com.carrotsearch.hppc.IntIntHashMap inlinkCount, int maxStates, int minInlinkCount) Compute the set of states that should be linearized first to minimize other states goto length.private com.carrotsearch.hppc.IntIntHashMapcomputeInlinkCount(FSA fsa) Compute in-link count for each state.private voidcomputeLabelsIndex(FSA fsa) Compute a set of labels to be integrated with the flags field.private intemitArc(OutputStream os, int flags, byte label, int targetOffset) private intemitNodeArcs(FSA fsa, OutputStream os, int state, int nextState) Emit all arcs of a single node.private intemitNodeData(OutputStream os, int number) private intemitNodes(FSA fsa, OutputStream os, com.carrotsearch.hppc.IntArrayList linearized) Update arc offsets assuming the given goto length.getFlags()Return supported flags.private com.carrotsearch.hppc.IntArrayListLinearization of states.private intlinearizeAndCalculateOffsets(FSA fsa, com.carrotsearch.hppc.IntArrayList states, com.carrotsearch.hppc.IntArrayList linearized, com.carrotsearch.hppc.IntIntHashMap offsets) Linearize all states, puttingstatesin front of the automaton and calculating stable state offsets.private voidlinearizeState(FSA fsa, com.carrotsearch.hppc.IntStack nodes, com.carrotsearch.hppc.IntArrayList linearized, BitSet visited, int node) Add a state to linearized list.private void<T extends OutputStream>
TwithAnnotationSeparator(byte annotationSeparator) Sets the annotation separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).withFiller(byte filler) Sets the filler separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).Serialize the automaton with the number of right-language sequences in each node.(package private) static intwriteVInt(byte[] array, int offset, int value) Write a v-int to a byte array.
-
Field Details
-
logger
-
flags
-
NO_STATE
private static final int NO_STATENo-state id.- See Also:
-
withNumbers
private boolean withNumberstrueif we should serialize with numbers.- See Also:
-
offsets
private com.carrotsearch.hppc.IntIntHashMap offsetsA hash map of [state, offset] pairs. -
numbers
private com.carrotsearch.hppc.IntIntHashMap numbersA hash map of [state, right-language-count] pairs. -
scratch
private final byte[] scratchScratch array for serializing vints. -
labelsIndex
private byte[] labelsIndexThe most frequent labels for integrating with the flags field. -
labelsInvIndex
private int[] labelsInvIndexInverted index of labels to be integrated with flags field. A label at indexihas the index or zero (no integration).
-
-
Constructor Details
-
CFSA2Serializer
public CFSA2Serializer()
-
-
Method Details
-
withNumbers
Serialize the automaton with the number of right-language sequences in each node. This is required to implement perfect hashing. The numbering also preserves the order of input sequences.- Specified by:
withNumbersin interfaceFSASerializer- Returns:
- Returns the same object for easier call chaining.
-
serialize
- Specified by:
serializein interfaceFSASerializer- Type Parameters:
T- A subclass ofOutputStream, returned for chaining.- Parameters:
fsa- The automaton to serialize.os- The output stream to serialize to.- Returns:
- Returns
osfor chaining. - Throws:
IOException- Rethrown if an I/O error occurs.- See Also:
-
computeLabelsIndex
Compute a set of labels to be integrated with the flags field. -
getFlags
Return supported flags.- Specified by:
getFlagsin interfaceFSASerializer- Returns:
- Returns the set of flags supported by the serializer (and the output automaton).
-
linearize
Linearization of states.- Throws:
IOException
-
log
-
linearizeAndCalculateOffsets
private int linearizeAndCalculateOffsets(FSA fsa, com.carrotsearch.hppc.IntArrayList states, com.carrotsearch.hppc.IntArrayList linearized, com.carrotsearch.hppc.IntIntHashMap offsets) throws IOException Linearize all states, puttingstatesin front of the automaton and calculating stable state offsets.- Throws:
IOException
-
linearizeState
-
computeFirstStates
private int[] computeFirstStates(com.carrotsearch.hppc.IntIntHashMap inlinkCount, int maxStates, int minInlinkCount) Compute the set of states that should be linearized first to minimize other states goto length. -
computeInlinkCount
Compute in-link count for each state. -
emitNodes
private int emitNodes(FSA fsa, OutputStream os, com.carrotsearch.hppc.IntArrayList linearized) throws IOException Update arc offsets assuming the given goto length.- Throws:
IOException
-
emitNodeArcs
Emit all arcs of a single node.- Throws:
IOException
-
emitArc
- Throws:
IOException
-
emitNodeData
- Throws:
IOException
-
withFiller
Description copied from interface:FSASerializerSets the filler separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).- Specified by:
withFillerin interfaceFSASerializer- Parameters:
filler- The filler separator byte.- Returns:
- Returns
thisfor call chaining.
-
withAnnotationSeparator
Description copied from interface:FSASerializerSets the annotation separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).- Specified by:
withAnnotationSeparatorin interfaceFSASerializer- Parameters:
annotationSeparator- The filler separator byte.- Returns:
- Returns
thisfor call chaining.
-
writeVInt
static int writeVInt(byte[] array, int offset, int value) Write a v-int to a byte array.
-