Package org.languagetool.languagemodel
Class LuceneSingleIndexLanguageModel
- java.lang.Object
-
- org.languagetool.languagemodel.BaseLanguageModel
-
- org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
-
- All Implemented Interfaces:
java.lang.AutoCloseable,LanguageModel
public class LuceneSingleIndexLanguageModel extends BaseLanguageModel
Information about ngram occurrences, taken from Lucene indexes (one index per ngram level). This is not a real language model as it only returns information about occurrence counts but has no probability calculation, especially not for the case with 0 occurrences.- Since:
- 3.2
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static classLuceneSingleIndexLanguageModel.LuceneSearcher
-
Field Summary
Fields Modifier and Type Field Description private static java.util.Map<java.io.File,LuceneSingleIndexLanguageModel.LuceneSearcher>dirToSearcherMapprivate java.util.List<java.io.File>indexesprivate java.util.Map<java.lang.Integer,LuceneSingleIndexLanguageModel.LuceneSearcher>luceneSearcherMapprivate longmaxNgramprivate java.io.FiletopIndexDir-
Fields inherited from interface org.languagetool.languagemodel.LanguageModel
GOOGLE_SENTENCE_END, GOOGLE_SENTENCE_START
-
-
Constructor Summary
Constructors Constructor Description LuceneSingleIndexLanguageModel(int maxNgram)LuceneSingleIndexLanguageModel(java.io.File topIndexDir)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private voidaddIndex(java.io.File topIndexDir, int ngramSize)static voidclearCaches()Only used internally.voidclose()protected voiddoValidateDirectory(java.io.File topIndexDir)private LuceneSingleIndexLanguageModel.LuceneSearchergetCachedLuceneSearcher(java.io.File indexDir)longgetCount(java.lang.String token1)Get the occurrence count fortoken.longgetCount(java.util.List<java.lang.String> tokens)Get the occurrence count for the given token sequence.private longgetCount(org.apache.lucene.index.Term term, LuceneSingleIndexLanguageModel.LuceneSearcher luceneSearcher)protected LuceneSingleIndexLanguageModel.LuceneSearchergetLuceneSearcher(int ngramSize)longgetTotalTokenCount()java.lang.StringtoString()static voidvalidateDirectory(java.io.File topIndexDir)Throw RuntimeException is the given directory does not seem to be a valid ngram top directory with sub directories1gramsetc.-
Methods inherited from class org.languagetool.languagemodel.BaseLanguageModel
getPseudoProbability, getPseudoProbabilityStupidBackoff
-
-
-
-
Field Detail
-
dirToSearcherMap
private static final java.util.Map<java.io.File,LuceneSingleIndexLanguageModel.LuceneSearcher> dirToSearcherMap
-
indexes
private final java.util.List<java.io.File> indexes
-
luceneSearcherMap
private final java.util.Map<java.lang.Integer,LuceneSingleIndexLanguageModel.LuceneSearcher> luceneSearcherMap
-
topIndexDir
private final java.io.File topIndexDir
-
maxNgram
private final long maxNgram
-
-
Constructor Detail
-
LuceneSingleIndexLanguageModel
public LuceneSingleIndexLanguageModel(java.io.File topIndexDir)
- Parameters:
topIndexDir- a directory which contains at least another sub directory called3grams, which is a Lucene index with ngram occurrences as created byorg.languagetool.dev.FrequencyIndexCreator.
-
LuceneSingleIndexLanguageModel
@Experimental public LuceneSingleIndexLanguageModel(int maxNgram)
-
-
Method Detail
-
validateDirectory
public static void validateDirectory(java.io.File topIndexDir)
Throw RuntimeException is the given directory does not seem to be a valid ngram top directory with sub directories1gramsetc.- Since:
- 3.0
-
clearCaches
@Experimental public static void clearCaches()
Only used internally.- Since:
- 3.2
-
doValidateDirectory
protected void doValidateDirectory(java.io.File topIndexDir)
-
addIndex
private void addIndex(java.io.File topIndexDir, int ngramSize)
-
getCount
public long getCount(java.util.List<java.lang.String> tokens)
Description copied from class:BaseLanguageModelGet the occurrence count for the given token sequence.- Specified by:
getCountin classBaseLanguageModel
-
getCount
public long getCount(java.lang.String token1)
Description copied from class:BaseLanguageModelGet the occurrence count fortoken.- Specified by:
getCountin classBaseLanguageModel
-
getTotalTokenCount
public long getTotalTokenCount()
- Specified by:
getTotalTokenCountin classBaseLanguageModel
-
getLuceneSearcher
protected LuceneSingleIndexLanguageModel.LuceneSearcher getLuceneSearcher(int ngramSize)
-
getCachedLuceneSearcher
private LuceneSingleIndexLanguageModel.LuceneSearcher getCachedLuceneSearcher(java.io.File indexDir)
-
getCount
private long getCount(org.apache.lucene.index.Term term, LuceneSingleIndexLanguageModel.LuceneSearcher luceneSearcher)
-
close
public void close()
-
toString
public java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
-