Package morfologik.stemming
Interface ISequenceEncoder
-
- All Known Implementing Classes:
NoEncoder,TrimInfixAndSuffixEncoder,TrimPrefixAndSuffixEncoder,TrimSuffixEncoder
public interface ISequenceEncoderThe logic of encoding one sequence of bytes relative to another sequence of bytes. The "base" form and the "derived" form are typically the stem of a word and the inflected form of a word.Derived form encoding helps in making the data for the automaton smaller and more repetitive (which results in higher compression rates).
See example implementation for details.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Deprecated Methods Modifier and Type Method Description java.nio.ByteBufferdecode(java.nio.ByteBuffer reuse, java.nio.ByteBuffer source, java.nio.ByteBuffer encoded)Decodesencodedrelative tosource, optionally reusing the providedByteBuffer.java.nio.ByteBufferencode(java.nio.ByteBuffer reuse, java.nio.ByteBuffer source, java.nio.ByteBuffer target)Encodestargetrelative tosource, optionally reusing the providedByteBuffer.intprefixBytes()Deprecated.
-
-
-
Method Detail
-
encode
java.nio.ByteBuffer encode(java.nio.ByteBuffer reuse, java.nio.ByteBuffer source, java.nio.ByteBuffer target)Encodestargetrelative tosource, optionally reusing the providedByteBuffer.- Parameters:
reuse- Reuses the providedByteBufferor allocates a new one if there is not enough remaining space.source- The source byte sequence.target- The target byte sequence to encode relative tosource- Returns:
- Returns the
ByteBufferwith encodedtarget.
-
decode
java.nio.ByteBuffer decode(java.nio.ByteBuffer reuse, java.nio.ByteBuffer source, java.nio.ByteBuffer encoded)Decodesencodedrelative tosource, optionally reusing the providedByteBuffer.- Parameters:
reuse- Reuses the providedByteBufferor allocates a new one if there is not enough remaining space.source- The source byte sequence.encoded- The previously encoded byte sequence.- Returns:
- Returns the
ByteBufferwith decodedtarget.
-
prefixBytes
@Deprecated int prefixBytes()
Deprecated.The number of encoded form's prefix bytes that should be ignored (needed for separator lookup). An ugly workaround for GH-85, should be fixed by prior knowledge of whether the dictionary contains tags; then we can scan for separator right-to-left.- See Also:
- "https://github.com/morfologik/morfologik-stemming/issues/85"
-
-