Package org.languagetool.tagging
Class BaseTagger
- java.lang.Object
-
- org.languagetool.tagging.BaseTagger
-
-
Field Summary
Fields Modifier and Type Field Description protected java.util.LocaleconversionLocaleprivate morfologik.stemming.Dictionarydictionaryprivate java.lang.StringdictionaryPathprivate booleantagLowercaseWithUppercaseprotected WordTaggerwordTagger
-
Constructor Summary
Constructors Constructor Description BaseTagger(java.lang.String filename)BaseTagger(java.lang.String filename, java.util.Locale conversionLocale)BaseTagger(java.lang.String filename, java.util.Locale locale, boolean tagLowercaseWithUppercase)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected @Nullable java.util.List<AnalyzedToken>additionalTags(java.lang.String word, WordTagger wordTagger)Allows additional tagging in some language-dependent circumstancesprivate voidaddTokens(java.util.List<AnalyzedToken> taggedTokens, java.util.List<AnalyzedToken> l)protected AnalyzedTokenasAnalyzedToken(java.lang.String word, morfologik.stemming.WordData wd)private AnalyzedTokenasAnalyzedToken(java.lang.String word, TaggedWord taggedWord)protected java.util.List<AnalyzedToken>asAnalyzedTokenList(java.lang.String word, java.util.List<morfologik.stemming.WordData> wdList)protected java.util.List<AnalyzedToken>asAnalyzedTokenListForTaggedWords(java.lang.String word, java.util.List<TaggedWord> taggedWords)AnalyzedTokenReadingscreateNullToken(java.lang.String token, int startPos)Create the AnalyzedToken used for whitespace and other non-words.AnalyzedTokencreateToken(java.lang.String token, java.lang.String posTag)Create a token specific to the language of the implementing class.protected java.util.List<AnalyzedToken>getAnalyzedTokens(java.lang.String word)protected morfologik.stemming.DictionarygetDictionary()java.lang.StringgetDictionaryPath()abstract @Nullable java.lang.StringgetManualAdditionsFileName()Get the filename for manual additions, e.g.,/en/added.txt, ornull.@Nullable java.lang.StringgetManualRemovalsFileName()Get the filename for manual removals, e.g.,/en/removed.txt, ornull.protected WordTaggergetWordTagger()private WordTaggerinitWordTagger()booleanoverwriteWithManualTagger()If true, tags from the binary dictionary (*.dict) will be overwritten by manual tags from the plain text dictionary.java.util.List<AnalyzedTokenReadings>tag(java.util.List<java.lang.String> sentenceTokens)Returns a list ofAnalyzedTokens that assigns each term in the sentence some kind of part-of-speech information (not necessarily just one tag).
-
-
-
Field Detail
-
wordTagger
protected final WordTagger wordTagger
-
conversionLocale
protected final java.util.Locale conversionLocale
-
tagLowercaseWithUppercase
private final boolean tagLowercaseWithUppercase
-
dictionaryPath
private final java.lang.String dictionaryPath
-
dictionary
private final morfologik.stemming.Dictionary dictionary
-
-
Constructor Detail
-
BaseTagger
public BaseTagger(java.lang.String filename)
- Since:
- 2.9
-
BaseTagger
public BaseTagger(java.lang.String filename, java.util.Locale conversionLocale)- Since:
- 2.9
-
BaseTagger
public BaseTagger(java.lang.String filename, java.util.Locale locale, boolean tagLowercaseWithUppercase)- Since:
- 2.9
-
-
Method Detail
-
getManualAdditionsFileName
@Nullable public abstract @Nullable java.lang.String getManualAdditionsFileName()
Get the filename for manual additions, e.g.,/en/added.txt, ornull.- Since:
- 2.8
-
getManualRemovalsFileName
@Nullable public @Nullable java.lang.String getManualRemovalsFileName()
Get the filename for manual removals, e.g.,/en/removed.txt, ornull.- Since:
- 3.2
-
getDictionaryPath
public java.lang.String getDictionaryPath()
- Since:
- 2.9
-
overwriteWithManualTagger
public boolean overwriteWithManualTagger()
If true, tags from the binary dictionary (*.dict) will be overwritten by manual tags from the plain text dictionary.- Since:
- 2.9
-
getWordTagger
protected WordTagger getWordTagger()
-
initWordTagger
private WordTagger initWordTagger()
-
getDictionary
protected morfologik.stemming.Dictionary getDictionary()
-
tag
public java.util.List<AnalyzedTokenReadings> tag(java.util.List<java.lang.String> sentenceTokens) throws java.io.IOException
Description copied from interface:TaggerReturns a list ofAnalyzedTokens that assigns each term in the sentence some kind of part-of-speech information (not necessarily just one tag).Note that this method takes exactly one sentence. Its implementation may implement special cases for the first word of a sentence, which is usually written with an uppercase letter.
-
getAnalyzedTokens
protected java.util.List<AnalyzedToken> getAnalyzedTokens(java.lang.String word)
-
asAnalyzedTokenList
protected java.util.List<AnalyzedToken> asAnalyzedTokenList(java.lang.String word, java.util.List<morfologik.stemming.WordData> wdList)
-
asAnalyzedTokenListForTaggedWords
protected java.util.List<AnalyzedToken> asAnalyzedTokenListForTaggedWords(java.lang.String word, java.util.List<TaggedWord> taggedWords)
-
asAnalyzedToken
protected AnalyzedToken asAnalyzedToken(java.lang.String word, morfologik.stemming.WordData wd)
-
asAnalyzedToken
private AnalyzedToken asAnalyzedToken(java.lang.String word, TaggedWord taggedWord)
-
addTokens
private void addTokens(java.util.List<AnalyzedToken> taggedTokens, java.util.List<AnalyzedToken> l)
-
createNullToken
public final AnalyzedTokenReadings createNullToken(java.lang.String token, int startPos)
Description copied from interface:TaggerCreate the AnalyzedToken used for whitespace and other non-words. Usenullas the POS tag for this token.- Specified by:
createNullTokenin interfaceTagger
-
createToken
public AnalyzedToken createToken(java.lang.String token, java.lang.String posTag)
Description copied from interface:TaggerCreate a token specific to the language of the implementing class.- Specified by:
createTokenin interfaceTagger
-
additionalTags
@Nullable protected @Nullable java.util.List<AnalyzedToken> additionalTags(java.lang.String word, WordTagger wordTagger)
Allows additional tagging in some language-dependent circumstances- Parameters:
word- The word to tag- Returns:
- Returns list of analyzed tokens with additional tags, or
null
-
-