Package morfologik.stemming
Class DictionaryMetadata
java.lang.Object
morfologik.stemming.DictionaryMetadata
Description of attributes, their types and default values.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final EnumMap<DictionaryAttribute, String> All attributes.private final EnumMap<DictionaryAttribute, Boolean> All "enabled" boolean attributes.private Charsetprivate static Map<DictionaryAttribute, String> Default attribute values.private EncoderTypeSequence encoder.private StringEncoding used for converting bytes to characters and vice versa.private LinkedHashMap<Character, List<Character>> Equivalent characters (treated similarly as equivalent chars with and without diacritics).private LinkedHashMap<String, String> Conversion pairs for input conversion, for example to replace ligatures.private Localestatic final StringExpected metadata file extension.private LinkedHashMap<String, String> Conversion pairs for output conversion, for example to replace ligatures.private LinkedHashMap<String, List<String>> Replacement pairs for non-obvious candidate search in a speller dictionary.private static EnumSet<DictionaryAttribute> Required attributes.private byteA separator character between fields (stem, lemma, form).private char -
Constructor Summary
ConstructorsConstructorDescriptionCreate an instance from an attribute map. -
Method Summary
Modifier and TypeMethodDescriptionstatic DictionaryMetadataBuilderbuilder()static StringgetExpectedMetadataFileName(String dictionaryFile) Returns the expected name of the metadata file, based on the name of the dictionary file.static PathgetExpectedMetadataLocation(Path dictionary) bytecharbooleanbooleanbooleanbooleanbooleanbooleanbooleanbooleanstatic DictionaryMetadataread(InputStream metadataStream) Read dictionary metadata from a property file (stream).voidWrite dictionary attributes (metadata).
-
Field Details
-
DEFAULT_ATTRIBUTES
Default attribute values. -
REQUIRED_ATTRIBUTES
Required attributes. -
separator
private byte separatorA separator character between fields (stem, lemma, form). The character must be within byte range (FSA uses bytes internally). -
separatorChar
private char separatorChar -
encoding
Encoding used for converting bytes to characters and vice versa. -
charset
-
locale
-
replacementPairs
Replacement pairs for non-obvious candidate search in a speller dictionary. -
inputConversion
Conversion pairs for input conversion, for example to replace ligatures. -
outputConversion
Conversion pairs for output conversion, for example to replace ligatures. -
equivalentChars
Equivalent characters (treated similarly as equivalent chars with and without diacritics). For example, Polish ł can be specified as equivalent to l. This implements a feature similar to hunspell MAP in the affix file. -
attributes
All attributes. -
boolAttributes
All "enabled" boolean attributes. -
encoderType
Sequence encoder. -
METADATA_FILE_EXTENSION
Expected metadata file extension.- See Also:
-
-
Constructor Details
-
DictionaryMetadata
Create an instance from an attribute map.- Parameters:
attrs- A set ofDictionaryAttributekeys and their associated values.- See Also:
-
-
Method Details
-
getAttributes
- Returns:
- Return all metadata attributes.
-
getEncoding
-
getSeparator
public byte getSeparator() -
getLocale
-
getInputConversionPairs
-
getOutputConversionPairs
-
getReplacementPairs
-
getEquivalentChars
-
isFrequencyIncluded
public boolean isFrequencyIncluded() -
isIgnoringPunctuation
public boolean isIgnoringPunctuation() -
isIgnoringNumbers
public boolean isIgnoringNumbers() -
isIgnoringCamelCase
public boolean isIgnoringCamelCase() -
isIgnoringAllUppercase
public boolean isIgnoringAllUppercase() -
isIgnoringDiacritics
public boolean isIgnoringDiacritics() -
isConvertingCase
public boolean isConvertingCase() -
isSupportingRunOnWords
public boolean isSupportingRunOnWords() -
getDecoder
- Returns:
- Returns a new
CharsetDecoderfor theencoding.
-
getEncoder
- Returns:
- Returns a new
CharsetEncoderfor theencoding.
-
getSequenceEncoderType
- Returns:
- Return sequence encoder type.
-
getSeparatorAsChar
public char getSeparatorAsChar()- Returns:
- Returns the
separatorbyte converted to a singlechar. - Throws:
RuntimeException- if this conversion is for some reason impossible (the byte is a surrogate pair, FSA'sencodingis not available).
-
builder
- Returns:
- A shortcut returning
DictionaryMetadataBuilder.
-
getExpectedMetadataFileName
Returns the expected name of the metadata file, based on the name of the dictionary file. The expected name is resolved by truncating any file extension ofnameand appendingMETADATA_FILE_EXTENSION.- Parameters:
dictionaryFile- The name of the dictionary (*.dict) file.- Returns:
- Returns the expected name of the metadata file.
-
getExpectedMetadataLocation
- Parameters:
dictionary- The location of the dictionary file.- Returns:
- Returns the expected location of a metadata file.
-
read
Read dictionary metadata from a property file (stream).- Parameters:
metadataStream- The stream with metadata.- Returns:
- Returns
DictionaryMetadataread from a the stream (property file). - Throws:
IOException- Thrown if an I/O exception occurs.
-
write
Write dictionary attributes (metadata).- Parameters:
writer- The writer to write to.- Throws:
IOException- Thrown when an I/O error occurs.
-