Package morfologik.fsa.builders
Class FSA5Serializer
java.lang.Object
morfologik.fsa.builders.FSA5Serializer
- All Implemented Interfaces:
FSASerializer
Serializes in-memory
FSA
graphs to a binary format compatible with
Jan Daciuk's fsa
's package FSA5
format.
It is possible to serialize the automaton with numbers required for perfect
hashing. See withNumbers()
method.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionbyte
byte
Supported flags.private static final int
Maximum number of bytes for a serialized arc.private static final int
Maximum number of bytes for per-node data.private com.carrotsearch.hppc.IntIntHashMap
A hash map of [state, right-language-count] pairs.private com.carrotsearch.hppc.IntIntHashMap
A hash map of [state, offset] pairs.private static final int
Number of bytes for the arc's flags header (arc representation without the goto address).private boolean
true
if we should serialize with numbers. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate int
emitArc
(ByteBuffer bb, OutputStream os, int gtl, int flags, byte label, int targetOffset) private boolean
emitArcs
(FSA fsa, OutputStream os, int[] linearized, int gtl, int nodeDataLength) Update arc offsets assuming the given goto length.private int
emitNodeData
(ByteBuffer bb, OutputStream os, int nodeDataLength, int number) getFlags()
Return supported flags.private int[]
Linearization of states.<T extends OutputStream>
TSerialize root states
to an output stream inFSA5
format.withAnnotationSeparator
(byte annotationSeparator) Sets the annotation separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).withFiller
(byte filler) Sets the filler separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).Serialize the automaton with the number of right-language sequences in each node.
-
Field Details
-
MAX_ARC_SIZE
private static final int MAX_ARC_SIZEMaximum number of bytes for a serialized arc.- See Also:
-
MAX_NODE_DATA_SIZE
private static final int MAX_NODE_DATA_SIZEMaximum number of bytes for per-node data.- See Also:
-
SIZEOF_FLAGS
private static final int SIZEOF_FLAGSNumber of bytes for the arc's flags header (arc representation without the goto address).- See Also:
-
flags
Supported flags. -
fillerByte
public byte fillerByte- See Also:
-
annotationByte
public byte annotationByte- See Also:
-
withNumbers
private boolean withNumberstrue
if we should serialize with numbers.- See Also:
-
offsets
private com.carrotsearch.hppc.IntIntHashMap offsetsA hash map of [state, offset] pairs. -
numbers
private com.carrotsearch.hppc.IntIntHashMap numbersA hash map of [state, right-language-count] pairs.
-
-
Constructor Details
-
FSA5Serializer
public FSA5Serializer()
-
-
Method Details
-
withNumbers
Serialize the automaton with the number of right-language sequences in each node. This is required to implement perfect hashing. The numbering also preserves the order of input sequences.- Specified by:
withNumbers
in interfaceFSASerializer
- Returns:
- Returns the same object for easier call chaining.
-
withFiller
Sets the filler separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).- Specified by:
withFiller
in interfaceFSASerializer
- Parameters:
filler
- The filler separator byte.- Returns:
- Returns
this
for call chaining.
-
withAnnotationSeparator
Sets the annotation separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).- Specified by:
withAnnotationSeparator
in interfaceFSASerializer
- Parameters:
annotationSeparator
- The filler separator byte.- Returns:
- Returns
this
for call chaining.
-
serialize
Serialize root states
to an output stream inFSA5
format.- Specified by:
serialize
in interfaceFSASerializer
- Type Parameters:
T
- A subclass ofOutputStream
, returned for chaining.- Parameters:
fsa
- The automaton to serialize.os
- The output stream to serialize to.- Returns:
- Returns
os
for chaining. - Throws:
IOException
- Rethrown if an I/O error occurs.- See Also:
-
getFlags
Return supported flags.- Specified by:
getFlags
in interfaceFSASerializer
- Returns:
- Returns the set of flags supported by the serializer (and the output automaton).
-
linearize
Linearization of states. -
emitArcs
private boolean emitArcs(FSA fsa, OutputStream os, int[] linearized, int gtl, int nodeDataLength) throws IOException Update arc offsets assuming the given goto length.- Throws:
IOException
-
emitArc
private int emitArc(ByteBuffer bb, OutputStream os, int gtl, int flags, byte label, int targetOffset) throws IOException - Throws:
IOException
-
emitNodeData
private int emitNodeData(ByteBuffer bb, OutputStream os, int nodeDataLength, int number) throws IOException - Throws:
IOException
-