Package morfologik.fsa.builders
Class FSA5Serializer
- java.lang.Object
-
- morfologik.fsa.builders.FSA5Serializer
-
- All Implemented Interfaces:
FSASerializer
public final class FSA5Serializer extends java.lang.Object implements FSASerializer
Serializes in-memoryFSAgraphs to a binary format compatible with Jan Daciuk'sfsa's packageFSA5format.It is possible to serialize the automaton with numbers required for perfect hashing. See
withNumbers()method.- See Also:
FSA5,FSA.read(java.io.InputStream)
-
-
Field Summary
Fields Modifier and Type Field Description byteannotationBytebytefillerByteprivate static java.util.EnumSet<FSAFlags>flagsSupported flags.private static intMAX_ARC_SIZEMaximum number of bytes for a serialized arc.private static intMAX_NODE_DATA_SIZEMaximum number of bytes for per-node data.private com.carrotsearch.hppc.IntIntHashMapnumbersA hash map of [state, right-language-count] pairs.private com.carrotsearch.hppc.IntIntHashMapoffsetsA hash map of [state, offset] pairs.private static intSIZEOF_FLAGSNumber of bytes for the arc's flags header (arc representation without the goto address).private booleanwithNumberstrueif we should serialize with numbers.
-
Constructor Summary
Constructors Constructor Description FSA5Serializer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private intemitArc(java.nio.ByteBuffer bb, java.io.OutputStream os, int gtl, int flags, byte label, int targetOffset)private booleanemitArcs(FSA fsa, java.io.OutputStream os, int[] linearized, int gtl, int nodeDataLength)Update arc offsets assuming the given goto length.private intemitNodeData(java.nio.ByteBuffer bb, java.io.OutputStream os, int nodeDataLength, int number)java.util.Set<FSAFlags>getFlags()Return supported flags.private int[]linearize(FSA fsa)Linearization of states.<T extends java.io.OutputStream>
Tserialize(FSA fsa, T os)Serialize root statesto an output stream inFSA5format.FSA5SerializerwithAnnotationSeparator(byte annotationSeparator)Sets the annotation separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).FSA5SerializerwithFiller(byte filler)Sets the filler separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).FSA5SerializerwithNumbers()Serialize the automaton with the number of right-language sequences in each node.
-
-
-
Field Detail
-
MAX_ARC_SIZE
private static final int MAX_ARC_SIZE
Maximum number of bytes for a serialized arc.- See Also:
- Constant Field Values
-
MAX_NODE_DATA_SIZE
private static final int MAX_NODE_DATA_SIZE
Maximum number of bytes for per-node data.- See Also:
- Constant Field Values
-
SIZEOF_FLAGS
private static final int SIZEOF_FLAGS
Number of bytes for the arc's flags header (arc representation without the goto address).- See Also:
- Constant Field Values
-
flags
private static final java.util.EnumSet<FSAFlags> flags
Supported flags.
-
fillerByte
public byte fillerByte
- See Also:
FSA5.filler
-
annotationByte
public byte annotationByte
- See Also:
FSA5.annotation
-
withNumbers
private boolean withNumbers
trueif we should serialize with numbers.- See Also:
withNumbers()
-
offsets
private com.carrotsearch.hppc.IntIntHashMap offsets
A hash map of [state, offset] pairs.
-
numbers
private com.carrotsearch.hppc.IntIntHashMap numbers
A hash map of [state, right-language-count] pairs.
-
-
Method Detail
-
withNumbers
public FSA5Serializer withNumbers()
Serialize the automaton with the number of right-language sequences in each node. This is required to implement perfect hashing. The numbering also preserves the order of input sequences.- Specified by:
withNumbersin interfaceFSASerializer- Returns:
- Returns the same object for easier call chaining.
-
withFiller
public FSA5Serializer withFiller(byte filler)
Sets the filler separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).- Specified by:
withFillerin interfaceFSASerializer- Parameters:
filler- The filler separator byte.- Returns:
- Returns
thisfor call chaining.
-
withAnnotationSeparator
public FSA5Serializer withAnnotationSeparator(byte annotationSeparator)
Sets the annotation separator (only ifFSASerializer.getFlags()returnsFSAFlags.SEPARATORS).- Specified by:
withAnnotationSeparatorin interfaceFSASerializer- Parameters:
annotationSeparator- The filler separator byte.- Returns:
- Returns
thisfor call chaining.
-
serialize
public <T extends java.io.OutputStream> T serialize(FSA fsa, T os) throws java.io.IOException
Serialize root statesto an output stream inFSA5format.- Specified by:
serializein interfaceFSASerializer- Type Parameters:
T- A subclass ofOutputStream, returned for chaining.- Parameters:
fsa- The automaton to serialize.os- The output stream to serialize to.- Returns:
- Returns
osfor chaining. - Throws:
java.io.IOException- Rethrown if an I/O error occurs.- See Also:
withNumbers()
-
getFlags
public java.util.Set<FSAFlags> getFlags()
Return supported flags.- Specified by:
getFlagsin interfaceFSASerializer- Returns:
- Returns the set of flags supported by the serializer (and the output automaton).
-
linearize
private int[] linearize(FSA fsa)
Linearization of states.
-
emitArcs
private boolean emitArcs(FSA fsa, java.io.OutputStream os, int[] linearized, int gtl, int nodeDataLength) throws java.io.IOException
Update arc offsets assuming the given goto length.- Throws:
java.io.IOException
-
emitArc
private int emitArc(java.nio.ByteBuffer bb, java.io.OutputStream os, int gtl, int flags, byte label, int targetOffset) throws java.io.IOException- Throws:
java.io.IOException
-
emitNodeData
private int emitNodeData(java.nio.ByteBuffer bb, java.io.OutputStream os, int nodeDataLength, int number) throws java.io.IOException- Throws:
java.io.IOException
-
-