Package morfologik.fsa.builders
Class FSA5Serializer
java.lang.Object
morfologik.fsa.builders.FSA5Serializer
- All Implemented Interfaces:
FSASerializer
Serializes in-memory
FSA graphs to a binary format compatible with
Jan Daciuk's fsa's package FSA5 format.
It is possible to serialize the automaton with numbers required for perfect
hashing. See withNumbers() method.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionbytebyteSupported flags.private static final intMaximum number of bytes for a serialized arc.private static final intMaximum number of bytes for per-node data.private com.carrotsearch.hppc.IntIntHashMapA hash map of [state, right-language-count] pairs.private com.carrotsearch.hppc.IntIntHashMapA hash map of [state, offset] pairs.private static final intNumber of bytes for the arc's flags header (arc representation without the goto address).private booleantrueif we should serialize with numbers. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate intemitArc(ByteBuffer bb, OutputStream os, int gtl, int flags, byte label, int targetOffset) private booleanemitArcs(FSA fsa, OutputStream os, int[] linearized, int gtl, int nodeDataLength) Update arc offsets assuming the given goto length.private intemitNodeData(ByteBuffer bb, OutputStream os, int nodeDataLength, int number) getFlags()Return supported flags.private int[]Linearization of states.<T extends OutputStream>
TSerialize root statesto an output stream inFSA5format.withAnnotationSeparator(byte annotationSeparator) Sets the annotation separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).withFiller(byte filler) Sets the filler separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).Serialize the automaton with the number of right-language sequences in each node.
-
Field Details
-
MAX_ARC_SIZE
private static final int MAX_ARC_SIZEMaximum number of bytes for a serialized arc.- See Also:
-
MAX_NODE_DATA_SIZE
private static final int MAX_NODE_DATA_SIZEMaximum number of bytes for per-node data.- See Also:
-
SIZEOF_FLAGS
private static final int SIZEOF_FLAGSNumber of bytes for the arc's flags header (arc representation without the goto address).- See Also:
-
flags
Supported flags. -
fillerByte
public byte fillerByte- See Also:
-
annotationByte
public byte annotationByte- See Also:
-
withNumbers
private boolean withNumberstrueif we should serialize with numbers.- See Also:
-
offsets
private com.carrotsearch.hppc.IntIntHashMap offsetsA hash map of [state, offset] pairs. -
numbers
private com.carrotsearch.hppc.IntIntHashMap numbersA hash map of [state, right-language-count] pairs.
-
-
Constructor Details
-
FSA5Serializer
public FSA5Serializer()
-
-
Method Details
-
withNumbers
Serialize the automaton with the number of right-language sequences in each node. This is required to implement perfect hashing. The numbering also preserves the order of input sequences.- Specified by:
withNumbersin interfaceFSASerializer- Returns:
- Returns the same object for easier call chaining.
-
withFiller
Sets the filler separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).- Specified by:
withFillerin interfaceFSASerializer- Parameters:
filler- The filler separator byte.- Returns:
- Returns
thisfor call chaining.
-
withAnnotationSeparator
Sets the annotation separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).- Specified by:
withAnnotationSeparatorin interfaceFSASerializer- Parameters:
annotationSeparator- The filler separator byte.- Returns:
- Returns
thisfor call chaining.
-
serialize
Serialize root statesto an output stream inFSA5format.- Specified by:
serializein interfaceFSASerializer- Type Parameters:
T- A subclass ofOutputStream, returned for chaining.- Parameters:
fsa- The automaton to serialize.os- The output stream to serialize to.- Returns:
- Returns
osfor chaining. - Throws:
IOException- Rethrown if an I/O error occurs.- See Also:
-
getFlags
Return supported flags.- Specified by:
getFlagsin interfaceFSASerializer- Returns:
- Returns the set of flags supported by the serializer (and the output automaton).
-
linearize
Linearization of states. -
emitArcs
private boolean emitArcs(FSA fsa, OutputStream os, int[] linearized, int gtl, int nodeDataLength) throws IOException Update arc offsets assuming the given goto length.- Throws:
IOException
-
emitArc
private int emitArc(ByteBuffer bb, OutputStream os, int gtl, int flags, byte label, int targetOffset) throws IOException - Throws:
IOException
-
emitNodeData
private int emitNodeData(ByteBuffer bb, OutputStream os, int nodeDataLength, int number) throws IOException - Throws:
IOException
-