Package org.languagetool.languagemodel
Class LuceneSingleIndexLanguageModel
java.lang.Object
org.languagetool.languagemodel.BaseLanguageModel
org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
- All Implemented Interfaces:
AutoCloseable,LanguageModel
Information about ngram occurrences, taken from Lucene indexes (one index per ngram level).
This is not a real language model as it only returns information
about occurrence counts but has no probability calculation, especially
not for the case with 0 occurrences.
- Since:
- 3.2
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static class -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Map<File, LuceneSingleIndexLanguageModel.LuceneSearcher> private final Map<Integer, LuceneSingleIndexLanguageModel.LuceneSearcher> private final longprivate final FileFields inherited from interface org.languagetool.languagemodel.LanguageModel
GOOGLE_SENTENCE_END, GOOGLE_SENTENCE_START -
Constructor Summary
ConstructorsConstructorDescriptionLuceneSingleIndexLanguageModel(int maxNgram) LuceneSingleIndexLanguageModel(File topIndexDir) -
Method Summary
Modifier and TypeMethodDescriptionprivate voidstatic voidOnly used internally.voidclose()protected voiddoValidateDirectory(File topIndexDir) getCachedLuceneSearcher(File indexDir) longGet the occurrence count fortoken.longGet the occurrence count for the given token sequence.private longgetCount(org.apache.lucene.index.Term term, LuceneSingleIndexLanguageModel.LuceneSearcher luceneSearcher) getLuceneSearcher(int ngramSize) longtoString()static voidvalidateDirectory(File topIndexDir) Throw RuntimeException is the given directory does not seem to be a valid ngram top directory with sub directories1gramsetc.Methods inherited from class org.languagetool.languagemodel.BaseLanguageModel
getPseudoProbability, getPseudoProbabilityStupidBackoff
-
Field Details
-
dirToSearcherMap
-
indexes
-
luceneSearcherMap
-
topIndexDir
-
maxNgram
private final long maxNgram
-
-
Constructor Details
-
LuceneSingleIndexLanguageModel
- Parameters:
topIndexDir- a directory which contains at least another sub directory called3grams, which is a Lucene index with ngram occurrences as created byorg.languagetool.dev.FrequencyIndexCreator.
-
LuceneSingleIndexLanguageModel
-
-
Method Details
-
validateDirectory
Throw RuntimeException is the given directory does not seem to be a valid ngram top directory with sub directories1gramsetc.- Since:
- 3.0
-
clearCaches
Only used internally.- Since:
- 3.2
-
doValidateDirectory
-
addIndex
-
getCount
Description copied from class:BaseLanguageModelGet the occurrence count for the given token sequence.- Specified by:
getCountin classBaseLanguageModel
-
getCount
Description copied from class:BaseLanguageModelGet the occurrence count fortoken.- Specified by:
getCountin classBaseLanguageModel
-
getTotalTokenCount
public long getTotalTokenCount()- Specified by:
getTotalTokenCountin classBaseLanguageModel
-
getLuceneSearcher
-
getCachedLuceneSearcher
-
getCount
private long getCount(org.apache.lucene.index.Term term, LuceneSingleIndexLanguageModel.LuceneSearcher luceneSearcher) -
close
public void close() -
toString
-