Package org.languagetool.dev
Class HomophoneOccurrenceDumper
- java.lang.Object
-
- org.languagetool.languagemodel.BaseLanguageModel
-
- org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
-
- org.languagetool.dev.HomophoneOccurrenceDumper
-
- All Implemented Interfaces:
java.lang.AutoCloseable,org.languagetool.languagemodel.LanguageModel
class HomophoneOccurrenceDumper extends org.languagetool.languagemodel.LuceneSingleIndexLanguageModelDump the occurrences of homophone 3grams to STDOUT. Useful to have a more compact file with homophone occurrences, as searching the homophones and their contexts in the Lucene index requires iterating all terms and is thus slow.- Since:
- 2.8
-
-
Field Summary
Fields Modifier and Type Field Description private static intMIN_COUNT
-
Constructor Summary
Constructors Constructor Description HomophoneOccurrenceDumper(java.io.File topIndexDir)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private voiddumpOccurrences(java.util.Set<java.lang.String> tokens)(package private) java.util.Map<java.lang.String,java.lang.Long>getContext(java.lang.String... tokens)Get the context (left and right words) for the given word(s).private org.apache.lucene.index.TermsEnumgetIterator()longgetTotalTokenCount()static voidmain(java.lang.String[] args)private voidrun(java.lang.String confusionSetPath)-
Methods inherited from class org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
clearCaches, close, doValidateDirectory, getCount, getCount, getLuceneSearcher, toString, validateDirectory
-
-
-
-
Field Detail
-
MIN_COUNT
private static final int MIN_COUNT
- See Also:
- Constant Field Values
-
-
Method Detail
-
getContext
java.util.Map<java.lang.String,java.lang.Long> getContext(java.lang.String... tokens) throws java.io.IOExceptionGet the context (left and right words) for the given word(s). This is slow, as it needs to scan the whole index.- Throws:
java.io.IOException
-
run
private void run(java.lang.String confusionSetPath) throws java.io.IOException- Throws:
java.io.IOException
-
dumpOccurrences
private void dumpOccurrences(java.util.Set<java.lang.String> tokens) throws java.io.IOException- Throws:
java.io.IOException
-
getIterator
private org.apache.lucene.index.TermsEnum getIterator() throws java.io.IOException- Throws:
java.io.IOException
-
main
public static void main(java.lang.String[] args) throws java.io.IOException- Throws:
java.io.IOException
-
getTotalTokenCount
public long getTotalTokenCount()
- Overrides:
getTotalTokenCountin classorg.languagetool.languagemodel.LuceneSingleIndexLanguageModel
-
-