Package org.languagetool
Class Language
- java.lang.Object
-
- org.languagetool.Language
-
- Direct Known Subclasses:
DynamicLanguage,LanguageBuilder.ExtendedLanguage,NoopLanguage,SimpleSentenceTokenizer.AnyLanguage
public abstract class Language extends java.lang.ObjectBase class for any supported language (English, German, etc). Language classes are detected at runtime by searching the classpath for files namedMETA-INF/org/languagetool/language-module.properties. Those file(s) need to contain a keylanguageClasseswhich specifies the fully qualified class name(s), e.g.org.languagetool.language.English. Use commas to specify more than one class.Sub classes should typically use lazy init for anything that's costly to set up. This improves start up time for the LanguageTool stand-alone version.
-
-
Field Summary
Fields Modifier and Type Field Description private static DisambiguatorDEMO_DISAMBIGUATORprivate static TaggerDEMO_TAGGERprivate UnifierConfigurationdisambiguationUnifierConfigprivate java.util.regex.PatternignoredCharactersRegexprivate booleannoLmWarningPrintedprivate java.util.List<AbstractPatternRule>patternRulesprivate static SentenceTokenizerSENTENCE_TOKENIZERprivate UnifierConfigurationunifierConfigprivate static WordTokenizerWORD_TOKENIZER
-
Constructor Summary
Constructors Constructor Description Language()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description booleanequals(java.lang.Object o)Considers languages as equal if their language code, including the country and variant codes are equal.booleanequalsConsiderVariantsIfSpecified(Language otherLanguage)Return true if this is the same language as the given one, considering country variants only if set for both languages.@Nullable ChunkergetChunker()Get this language's chunker implementation ornull.java.lang.StringgetCommonWordsPath()A file with commons words, either in the classpath or as a filename in the file system.abstract java.lang.String[]getCountries()Get this language's country options , e.g.java.util.List<java.lang.String>getDefaultDisabledRulesForVariant()Get disabled rules different from the default ones for this language variant.java.util.List<java.lang.String>getDefaultEnabledRulesForVariant()Get enabled rules different from the default ones for this language variant.@Nullable LanguagegetDefaultLanguageVariant()Languages that have country variants need to overwrite this to select their most common variant.UnifiergetDisambiguationUnifier()Get this language's feature unifier used for disambiguation.UnifierConfigurationgetDisambiguationUnifierConfiguration()DisambiguatorgetDisambiguator()Get this language's part-of-speech disambiguator implementation.java.util.regex.PatterngetIgnoredCharactersRegex()@Nullable LanguageModelgetLanguageModel(java.io.File indexDir)java.util.LocalegetLocale()Get this language's Java locale, not considering the country code.java.util.LocalegetLocaleWithCountryAndVariant()Get this language's Java locale, considering language code and country code (if any).LanguageMaintainedStategetMaintainedState()Information about whether the support for this language in LanguageTool is actively maintained.abstract @Nullable Contributor[]getMaintainers()Get the name(s) of the maintainer(s) for this language ornull.abstract java.lang.StringgetName()Get this language's name in English, e.g.protected java.util.List<AbstractPatternRule>getPatternRules()Get the pattern rules as defined in the files returned bygetRuleFileNames().@Nullable ChunkergetPostDisambiguationChunker()Get this language's chunker implementation ornull.intgetPriorityForId(java.lang.String id)Returns a priority for Rule or Category Id (default: 0).java.util.List<Rule>getRelevantLanguageModelCapableRules(java.util.ResourceBundle messages, @Nullable LanguageModel languageModel, UserConfig userConfig, Language motherTongue, java.util.List<Language> altLanguages)Get a list of rules that can optionally use aLanguageModel.java.util.List<Rule>getRelevantLanguageModelRules(java.util.ResourceBundle messages, LanguageModel languageModel)Get a list of rules that require aLanguageModel.java.util.List<Rule>getRelevantNeuralNetworkModels(java.util.ResourceBundle messages, java.io.File modelDir)Get a list of rules that load trained neural networks.abstract java.util.List<Rule>getRelevantRules(java.util.ResourceBundle messages, UserConfig userConfig, Language motherTongue, java.util.List<Language> altLanguages)Get the rules classes that should run for texts in this language.java.util.List<Rule>getRelevantRulesGlobalConfig(java.util.ResourceBundle messages, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, java.util.List<Language> altLanguages)Get the rules classes that should run for texts in this language.java.util.List<Rule>getRelevantWord2VecModelRules(java.util.ResourceBundle messages, Word2VecModel word2vecModel)Get a list of rules that require aWord2VecModel.java.util.List<java.lang.String>getRuleFileNames()Get the location of the rule file(s) in a form like/org/languagetool/rules/de/grammar.xml, i.e.SentenceTokenizergetSentenceTokenizer()Get this language's sentence tokenizer implementation.abstract java.lang.StringgetShortCode()Get this language's character code, e.g.java.lang.StringgetShortCodeWithCountryAndVariant()Get the short name of the language with country and variant (if any), if it is a single-country language.@Nullable SynthesizergetSynthesizer()Get this language's part-of-speech synthesizer implementation ornull.TaggergetTagger()Get this language's part-of-speech tagger implementation.java.lang.StringgetTranslatedName(java.util.ResourceBundle messages)Get the name of the language translated to the current locale, if available.UnifiergetUnifier()Get this language's feature unifier.UnifierConfigurationgetUnifierConfiguration()@Nullable java.lang.StringgetVariant()Get this language's variant, e.g.@Nullable Word2VecModelgetWord2VecModel(java.io.File indexDir)TokenizergetWordTokenizer()Get this language's word tokenizer implementation.private booleanhasCountry()inthashCode()booleanhasNGramFalseFriendRule(Language motherTongue)Return true if language has ngram-based false friend rule returned bygetRelevantLanguageModelCapableRules(java.util.ResourceBundle, org.languagetool.languagemodel.LanguageModel, org.languagetool.UserConfig, org.languagetool.Language, java.util.List<org.languagetool.Language>).booleanhasVariant()Whether this class has at least one subclass that implements variants of this language.protected LanguageModelinitLanguageModel(java.io.File indexDir, LanguageModel languageModel)booleanisExternal()For internal use only.booleanisHiddenFromGui()booleanisSpellcheckOnlyLanguage()Whether this language supports spell checking only and no advanced grammar and style checking.private booleanisTheDefaultVariant()booleanisVariant()Whether this is a country variant of another language, i.e.java.lang.StringtoString()
-
-
-
Field Detail
-
DEMO_DISAMBIGUATOR
private static final Disambiguator DEMO_DISAMBIGUATOR
-
DEMO_TAGGER
private static final Tagger DEMO_TAGGER
-
SENTENCE_TOKENIZER
private static final SentenceTokenizer SENTENCE_TOKENIZER
-
WORD_TOKENIZER
private static final WordTokenizer WORD_TOKENIZER
-
unifierConfig
private final UnifierConfiguration unifierConfig
-
disambiguationUnifierConfig
private final UnifierConfiguration disambiguationUnifierConfig
-
ignoredCharactersRegex
private final java.util.regex.Pattern ignoredCharactersRegex
-
patternRules
private java.util.List<AbstractPatternRule> patternRules
-
noLmWarningPrinted
private boolean noLmWarningPrinted
-
-
Method Detail
-
getShortCode
public abstract java.lang.String getShortCode()
Get this language's character code, e.g.enfor English. For most languages this is a two-letter code according to ISO 639-1, but for those languages that don't have a two-letter code, a three-letter code according to ISO 639-2 is returned. The country parameter (e.g. "US"), if any, is not returned.- Since:
- 3.6
-
getName
public abstract java.lang.String getName()
Get this language's name in English, e.g.EnglishorGerman (Germany).- Returns:
- language name
-
getCountries
public abstract java.lang.String[] getCountries()
Get this language's country options , e.g.US(as inen-US) orPL(as inpl-PL).- Returns:
- String[] - array of country options for the language.
-
getMaintainers
@Nullable public abstract @Nullable Contributor[] getMaintainers()
Get the name(s) of the maintainer(s) for this language ornull.
-
getRelevantRules
public abstract java.util.List<Rule> getRelevantRules(java.util.ResourceBundle messages, UserConfig userConfig, Language motherTongue, java.util.List<Language> altLanguages) throws java.io.IOException
Get the rules classes that should run for texts in this language.- Throws:
java.io.IOException- Since:
- 4.3
-
getCommonWordsPath
public java.lang.String getCommonWordsPath()
A file with commons words, either in the classpath or as a filename in the file system.- Since:
- 4.5
-
getVariant
@Nullable public @Nullable java.lang.String getVariant()
Get this language's variant, e.g.valencia(as inca-ES-valencia) ornull. Attention: not to be confused with "country" option- Returns:
- variant for the language or
null - Since:
- 2.3
-
getDefaultEnabledRulesForVariant
public java.util.List<java.lang.String> getDefaultEnabledRulesForVariant()
Get enabled rules different from the default ones for this language variant.- Returns:
- enabled rules for the language variant.
- Since:
- 2.4
-
getDefaultDisabledRulesForVariant
public java.util.List<java.lang.String> getDefaultDisabledRulesForVariant()
Get disabled rules different from the default ones for this language variant.- Returns:
- disabled rules for the language variant.
- Since:
- 2.4
-
getLanguageModel
@Nullable public @Nullable LanguageModel getLanguageModel(java.io.File indexDir) throws java.io.IOException
- Parameters:
indexDir- directory with a '3grams' sub directory which contains a Lucene index with 3gram occurrence counts- Returns:
- a LanguageModel or
nullif this language doesn't support one - Throws:
java.io.IOException- Since:
- 2.7
-
initLanguageModel
protected LanguageModel initLanguageModel(java.io.File indexDir, LanguageModel languageModel)
-
getRelevantLanguageModelRules
public java.util.List<Rule> getRelevantLanguageModelRules(java.util.ResourceBundle messages, LanguageModel languageModel) throws java.io.IOException
Get a list of rules that require aLanguageModel. Returns an empty list for languages that don't have such rules.- Throws:
java.io.IOException- Since:
- 2.7
-
getRelevantLanguageModelCapableRules
public java.util.List<Rule> getRelevantLanguageModelCapableRules(java.util.ResourceBundle messages, @Nullable @Nullable LanguageModel languageModel, UserConfig userConfig, Language motherTongue, java.util.List<Language> altLanguages) throws java.io.IOException
Get a list of rules that can optionally use aLanguageModel. Returns an empty list for languages that don't have such rules.- Parameters:
languageModel- null if no language model is available- Throws:
java.io.IOException- Since:
- 4.5
-
getWord2VecModel
@Nullable public @Nullable Word2VecModel getWord2VecModel(java.io.File indexDir) throws java.io.IOException
- Parameters:
indexDir- directory with a subdirectories like 'en', each containing dictionary.txt and final_embeddings.txt- Returns:
- a
Word2VecModelornullif this language doesn't support one - Throws:
java.io.IOException- Since:
- 4.0
-
getRelevantWord2VecModelRules
public java.util.List<Rule> getRelevantWord2VecModelRules(java.util.ResourceBundle messages, Word2VecModel word2vecModel) throws java.io.IOException
Get a list of rules that require aWord2VecModel. Returns an empty list for languages that don't have such rules.- Throws:
java.io.IOException- Since:
- 4.0
-
getRelevantNeuralNetworkModels
public java.util.List<Rule> getRelevantNeuralNetworkModels(java.util.ResourceBundle messages, java.io.File modelDir)
Get a list of rules that load trained neural networks. Returns an empty list for languages that don't have such rules.- Since:
- 4.4
-
getRelevantRulesGlobalConfig
public java.util.List<Rule> getRelevantRulesGlobalConfig(java.util.ResourceBundle messages, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, java.util.List<Language> altLanguages) throws java.io.IOException
Get the rules classes that should run for texts in this language.- Throws:
java.io.IOException- Since:
- 4.6
-
getLocale
public java.util.Locale getLocale()
Get this language's Java locale, not considering the country code.
-
getLocaleWithCountryAndVariant
public java.util.Locale getLocaleWithCountryAndVariant()
Get this language's Java locale, considering language code and country code (if any).- Since:
- 2.1
-
getRuleFileNames
public java.util.List<java.lang.String> getRuleFileNames()
Get the location of the rule file(s) in a form like/org/languagetool/rules/de/grammar.xml, i.e. a path in the classpath. The files must exist or an exception will be thrown, unless the filename contains the string-test-.
-
getDefaultLanguageVariant
@Nullable public @Nullable Language getDefaultLanguageVariant()
Languages that have country variants need to overwrite this to select their most common variant.- Returns:
- default country variant or
null - Since:
- 1.8
-
getDisambiguator
public Disambiguator getDisambiguator()
Get this language's part-of-speech disambiguator implementation.
-
getTagger
public Tagger getTagger()
Get this language's part-of-speech tagger implementation. The tagger must not benull, but it can be a trivial pseudo-tagger that only assignsnulltags.
-
getSentenceTokenizer
public SentenceTokenizer getSentenceTokenizer()
Get this language's sentence tokenizer implementation.
-
getWordTokenizer
public Tokenizer getWordTokenizer()
Get this language's word tokenizer implementation.
-
getChunker
@Nullable public @Nullable Chunker getChunker()
Get this language's chunker implementation ornull.- Since:
- 2.3
-
getPostDisambiguationChunker
@Nullable public @Nullable Chunker getPostDisambiguationChunker()
Get this language's chunker implementation ornull.- Since:
- 2.9
-
getSynthesizer
@Nullable public @Nullable Synthesizer getSynthesizer()
Get this language's part-of-speech synthesizer implementation ornull.
-
getUnifier
public Unifier getUnifier()
Get this language's feature unifier.- Returns:
- Feature unifier for analyzed tokens.
-
getDisambiguationUnifier
public Unifier getDisambiguationUnifier()
Get this language's feature unifier used for disambiguation. Note: it might be different from the normal rule unifier.- Returns:
- Feature unifier for analyzed tokens.
-
getUnifierConfiguration
public UnifierConfiguration getUnifierConfiguration()
- Since:
- 2.3
-
getDisambiguationUnifierConfiguration
public UnifierConfiguration getDisambiguationUnifierConfiguration()
- Since:
- 2.3
-
getTranslatedName
public final java.lang.String getTranslatedName(java.util.ResourceBundle messages)
Get the name of the language translated to the current locale, if available. Otherwise, get the untranslated name.
-
getShortCodeWithCountryAndVariant
public final java.lang.String getShortCodeWithCountryAndVariant()
Get the short name of the language with country and variant (if any), if it is a single-country language. For generic language classes, get only a two- or three-character code.- Since:
- 3.6
-
getPatternRules
protected java.util.List<AbstractPatternRule> getPatternRules() throws java.io.IOException
Get the pattern rules as defined in the files returned bygetRuleFileNames().- Throws:
java.io.IOException- Since:
- 2.7
-
toString
public final java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
isVariant
public final boolean isVariant()
Whether this is a country variant of another language, i.e. whether it doesn't directly extendLanguage, but a subclass ofLanguage.- Since:
- 1.8
-
hasVariant
public final boolean hasVariant()
Whether this class has at least one subclass that implements variants of this language.- Since:
- 1.8
-
isExternal
public boolean isExternal()
For internal use only. Overwritten to returntruefor languages that have been loaded from an external file after start up.
-
equalsConsiderVariantsIfSpecified
public boolean equalsConsiderVariantsIfSpecified(Language otherLanguage)
Return true if this is the same language as the given one, considering country variants only if set for both languages. For example: en = en, en = en-GB, en-GB = en-GB, but en-US != en-GB- Since:
- 1.8
-
hasCountry
private boolean hasCountry()
-
getIgnoredCharactersRegex
public java.util.regex.Pattern getIgnoredCharactersRegex()
- Returns:
- Return compiled regular expression to ignore inside tokens
- Since:
- 2.9
-
getMaintainedState
public LanguageMaintainedState getMaintainedState()
Information about whether the support for this language in LanguageTool is actively maintained. If not, the user interface might show a warning.- Since:
- 3.3
-
isHiddenFromGui
public boolean isHiddenFromGui()
-
isTheDefaultVariant
private boolean isTheDefaultVariant()
-
getPriorityForId
public int getPriorityForId(java.lang.String id)
Returns a priority for Rule or Category Id (default: 0). Positive integers have higher priority. Negative integers have lower priority.- Since:
- 3.6
-
isSpellcheckOnlyLanguage
public boolean isSpellcheckOnlyLanguage()
Whether this language supports spell checking only and no advanced grammar and style checking.- Since:
- 4.5
-
hasNGramFalseFriendRule
public boolean hasNGramFalseFriendRule(Language motherTongue)
Return true if language has ngram-based false friend rule returned bygetRelevantLanguageModelCapableRules(java.util.ResourceBundle, org.languagetool.languagemodel.LanguageModel, org.languagetool.UserConfig, org.languagetool.Language, java.util.List<org.languagetool.Language>).- Since:
- 4.6
-
equals
public boolean equals(java.lang.Object o)
Considers languages as equal if their language code, including the country and variant codes are equal.- Overrides:
equalsin classjava.lang.Object
-
hashCode
public int hashCode()
- Overrides:
hashCodein classjava.lang.Object
-
-