Package morfologik.fsa.builders
Class CFSA2Serializer
- java.lang.Object
-
- morfologik.fsa.builders.CFSA2Serializer
-
- All Implemented Interfaces:
FSASerializer
public final class CFSA2Serializer extends java.lang.Object implements FSASerializer
Serializes in-memoryFSAgraphs toCFSA2.It is possible to serialize the automaton with numbers required for perfect hashing. See
withNumbers()method.- See Also:
CFSA2
-
-
Field Summary
Fields Modifier and Type Field Description private static java.util.EnumSet<FSAFlags>flagsSupported flags.private byte[]labelsIndexThe most frequent labels for integrating with the flags field.private int[]labelsInvIndexInverted index of labels to be integrated with flags field.private java.util.logging.Loggerloggerprivate static intNO_STATENo-state id.private com.carrotsearch.hppc.IntIntHashMapnumbersA hash map of [state, right-language-count] pairs.private com.carrotsearch.hppc.IntIntHashMapoffsetsA hash map of [state, offset] pairs.private byte[]scratchScratch array for serializing vints.private booleanwithNumberstrueif we should serialize with numbers.
-
Constructor Summary
Constructors Constructor Description CFSA2Serializer()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private int[]computeFirstStates(com.carrotsearch.hppc.IntIntHashMap inlinkCount, int maxStates, int minInlinkCount)Compute the set of states that should be linearized first to minimize other states goto length.private com.carrotsearch.hppc.IntIntHashMapcomputeInlinkCount(FSA fsa)Compute in-link count for each state.private voidcomputeLabelsIndex(FSA fsa)Compute a set of labels to be integrated with the flags field.private intemitArc(java.io.OutputStream os, int flags, byte label, int targetOffset)private intemitNodeArcs(FSA fsa, java.io.OutputStream os, int state, int nextState)Emit all arcs of a single node.private intemitNodeData(java.io.OutputStream os, int number)private intemitNodes(FSA fsa, java.io.OutputStream os, com.carrotsearch.hppc.IntArrayList linearized)Update arc offsets assuming the given goto length.java.util.Set<FSAFlags>getFlags()Return supported flags.private com.carrotsearch.hppc.IntArrayListlinearize(FSA fsa)Linearization of states.private intlinearizeAndCalculateOffsets(FSA fsa, com.carrotsearch.hppc.IntArrayList states, com.carrotsearch.hppc.IntArrayList linearized, com.carrotsearch.hppc.IntIntHashMap offsets)Linearize all states, puttingstatesin front of the automaton and calculating stable state offsets.private voidlinearizeState(FSA fsa, com.carrotsearch.hppc.IntStack nodes, com.carrotsearch.hppc.IntArrayList linearized, java.util.BitSet visited, int node)Add a state to linearized list.private voidlog(java.util.logging.Level level, java.lang.String msg, java.lang.Object... args)<T extends java.io.OutputStream>
Tserialize(FSA fsa, T os)CFSA2SerializerwithAnnotationSeparator(byte annotationSeparator)Sets the annotation separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).CFSA2SerializerwithFiller(byte filler)Sets the filler separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).CFSA2SerializerwithNumbers()Serialize the automaton with the number of right-language sequences in each node.(package private) static intwriteVInt(byte[] array, int offset, int value)Write a v-int to a byte array.
-
-
-
Field Detail
-
logger
private final java.util.logging.Logger logger
-
flags
private static final java.util.EnumSet<FSAFlags> flags
Supported flags.
-
NO_STATE
private static final int NO_STATE
No-state id.- See Also:
- Constant Field Values
-
withNumbers
private boolean withNumbers
trueif we should serialize with numbers.- See Also:
withNumbers()
-
offsets
private com.carrotsearch.hppc.IntIntHashMap offsets
A hash map of [state, offset] pairs.
-
numbers
private com.carrotsearch.hppc.IntIntHashMap numbers
A hash map of [state, right-language-count] pairs.
-
scratch
private final byte[] scratch
Scratch array for serializing vints.
-
labelsIndex
private byte[] labelsIndex
The most frequent labels for integrating with the flags field.
-
labelsInvIndex
private int[] labelsInvIndex
Inverted index of labels to be integrated with flags field. A label at indexihas the index or zero (no integration).
-
-
Method Detail
-
withNumbers
public CFSA2Serializer withNumbers()
Serialize the automaton with the number of right-language sequences in each node. This is required to implement perfect hashing. The numbering also preserves the order of input sequences.- Specified by:
withNumbersin interfaceFSASerializer- Returns:
- Returns the same object for easier call chaining.
-
serialize
public <T extends java.io.OutputStream> T serialize(FSA fsa, T os) throws java.io.IOException
- Specified by:
serializein interfaceFSASerializer- Type Parameters:
T- A subclass ofOutputStream, returned for chaining.- Parameters:
fsa- The automaton to serialize.os- The output stream to serialize to.- Returns:
- Returns
osfor chaining. - Throws:
java.io.IOException- Rethrown if an I/O error occurs.- See Also:
withNumbers()
-
computeLabelsIndex
private void computeLabelsIndex(FSA fsa)
Compute a set of labels to be integrated with the flags field.
-
getFlags
public java.util.Set<FSAFlags> getFlags()
Return supported flags.- Specified by:
getFlagsin interfaceFSASerializer- Returns:
- Returns the set of flags supported by the serializer (and the output automaton).
-
linearize
private com.carrotsearch.hppc.IntArrayList linearize(FSA fsa) throws java.io.IOException
Linearization of states.- Throws:
java.io.IOException
-
log
private void log(java.util.logging.Level level, java.lang.String msg, java.lang.Object... args)
-
linearizeAndCalculateOffsets
private int linearizeAndCalculateOffsets(FSA fsa, com.carrotsearch.hppc.IntArrayList states, com.carrotsearch.hppc.IntArrayList linearized, com.carrotsearch.hppc.IntIntHashMap offsets) throws java.io.IOException
Linearize all states, puttingstatesin front of the automaton and calculating stable state offsets.- Throws:
java.io.IOException
-
linearizeState
private void linearizeState(FSA fsa, com.carrotsearch.hppc.IntStack nodes, com.carrotsearch.hppc.IntArrayList linearized, java.util.BitSet visited, int node)
Add a state to linearized list.
-
computeFirstStates
private int[] computeFirstStates(com.carrotsearch.hppc.IntIntHashMap inlinkCount, int maxStates, int minInlinkCount)Compute the set of states that should be linearized first to minimize other states goto length.
-
computeInlinkCount
private com.carrotsearch.hppc.IntIntHashMap computeInlinkCount(FSA fsa)
Compute in-link count for each state.
-
emitNodes
private int emitNodes(FSA fsa, java.io.OutputStream os, com.carrotsearch.hppc.IntArrayList linearized) throws java.io.IOException
Update arc offsets assuming the given goto length.- Throws:
java.io.IOException
-
emitNodeArcs
private int emitNodeArcs(FSA fsa, java.io.OutputStream os, int state, int nextState) throws java.io.IOException
Emit all arcs of a single node.- Throws:
java.io.IOException
-
emitArc
private int emitArc(java.io.OutputStream os, int flags, byte label, int targetOffset) throws java.io.IOException- Throws:
java.io.IOException
-
emitNodeData
private int emitNodeData(java.io.OutputStream os, int number) throws java.io.IOException- Throws:
java.io.IOException
-
withFiller
public CFSA2Serializer withFiller(byte filler)
Description copied from interface:FSASerializerSets the filler separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).- Specified by:
withFillerin interfaceFSASerializer- Parameters:
filler- The filler separator byte.- Returns:
- Returns
thisfor call chaining.
-
withAnnotationSeparator
public CFSA2Serializer withAnnotationSeparator(byte annotationSeparator)
Description copied from interface:FSASerializerSets the annotation separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).- Specified by:
withAnnotationSeparatorin interfaceFSASerializer- Parameters:
annotationSeparator- The filler separator byte.- Returns:
- Returns
thisfor call chaining.
-
writeVInt
static int writeVInt(byte[] array, int offset, int value)Write a v-int to a byte array.
-
-