Interface ISequenceEncoder
- All Known Implementing Classes:
NoEncoder, TrimInfixAndSuffixEncoder, TrimPrefixAndSuffixEncoder, TrimSuffixEncoder
public interface ISequenceEncoder
The logic of encoding one sequence of bytes relative to another sequence of
bytes. The "base" form and the "derived" form are typically the stem of
a word and the inflected form of a word.
Derived form encoding helps in making the data for the automaton smaller and more repetitive (which results in higher compression rates).
See example implementation for details.
-
Method Summary
Modifier and TypeMethodDescriptiondecode(ByteBuffer reuse, ByteBuffer source, ByteBuffer encoded) encode(ByteBuffer reuse, ByteBuffer source, ByteBuffer target) intDeprecated.
-
Method Details
-
encode
- Parameters:
reuse- Reuses the providedByteBufferor allocates a new one if there is not enough remaining space.source- The source byte sequence.target- The target byte sequence to encode relative tosource- Returns:
- Returns the
ByteBufferwith encodedtarget.
-
decode
- Parameters:
reuse- Reuses the providedByteBufferor allocates a new one if there is not enough remaining space.source- The source byte sequence.encoded- The previously encoded byte sequence.- Returns:
- Returns the
ByteBufferwith decodedtarget.
-
prefixBytes
Deprecated.The number of encoded form's prefix bytes that should be ignored (needed for separator lookup). An ugly workaround for GH-85, should be fixed by prior knowledge of whether the dictionary contains tags; then we can scan for separator right-to-left.- See Also:
-