Package morfologik.stemming
Class TrimPrefixAndSuffixEncoder
java.lang.Object
morfologik.stemming.TrimPrefixAndSuffixEncoder
- All Implemented Interfaces:
ISequenceEncoder
Encodes
dst relative to src by trimming whatever
non-equal suffix and prefix src and dst have. The
output code is (bytes):
{P}{K}{suffix}
where (P - 'A') bytes should be trimmed from the start of
src, (K - 'A') bytes should be trimmed from the
end of src and then the suffix should be appended
to the resulting byte sequence.
Examples:
src: abc dst: abcd encoded: AAd src: abc dst: xyz encoded: ADxyz
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final intMaximum encodable single-byte code. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondecode(ByteBuffer reuse, ByteBuffer source, ByteBuffer encoded) encode(ByteBuffer reuse, ByteBuffer source, ByteBuffer target) intThe number of encoded form's prefix bytes that should be ignored (needed for separator lookup).toString()
-
Field Details
-
REMOVE_EVERYTHING
private static final int REMOVE_EVERYTHINGMaximum encodable single-byte code.- See Also:
-
-
Constructor Details
-
TrimPrefixAndSuffixEncoder
public TrimPrefixAndSuffixEncoder()
-
-
Method Details
-
encode
Description copied from interface:ISequenceEncoder- Specified by:
encodein interfaceISequenceEncoder- Parameters:
reuse- Reuses the providedByteBufferor allocates a new one if there is not enough remaining space.source- The source byte sequence.target- The target byte sequence to encode relative tosource- Returns:
- Returns the
ByteBufferwith encodedtarget.
-
prefixBytes
public int prefixBytes()Description copied from interface:ISequenceEncoderThe number of encoded form's prefix bytes that should be ignored (needed for separator lookup). An ugly workaround for GH-85, should be fixed by prior knowledge of whether the dictionary contains tags; then we can scan for separator right-to-left.- Specified by:
prefixBytesin interfaceISequenceEncoder- See Also:
-
decode
Description copied from interface:ISequenceEncoder- Specified by:
decodein interfaceISequenceEncoder- Parameters:
reuse- Reuses the providedByteBufferor allocates a new one if there is not enough remaining space.source- The source byte sequence.encoded- The previously encoded byte sequence.- Returns:
- Returns the
ByteBufferwith decodedtarget.
-
toString
-