Class EnglishChunker
java.lang.Object
org.languagetool.chunking.EnglishChunker
- All Implemented Interfaces:
org.languagetool.chunking.Chunker
OpenNLP-based chunker. Also uses the OpenNLP tokenizer and POS tagger and
maps the result to our own tokens (we have our own tokenizer), as far as trivially possible.
- Since:
- 2.3
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Stringprivate static opennlp.tools.chunker.ChunkerModelprivate final EnglishChunkFilterprivate static final Stringprivate static opennlp.tools.postag.POSModelprivate static final Stringprivate static opennlp.tools.tokenize.TokenizerModelThis needs to be static to save memory: as Language.LANGUAGES is static, any language that is once created there will never be released. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidaddChunkTags(List<org.languagetool.AnalyzedTokenReadings> tokenReadings) private voidassignChunksToReadings(List<ChunkTaggedToken> chunkTaggedTokens) private String[]private @Nullable org.languagetool.AnalyzedTokenReadingsgetAnalyzedTokenReadingsFor(int startPos, int endPos, List<org.languagetool.AnalyzedTokenReadings> tokenReadings) private List<ChunkTaggedToken> getChunkTagsForReadings(List<org.languagetool.AnalyzedTokenReadings> tokenReadings) private StringgetSentence(List<org.languagetool.AnalyzedTokenReadings> sentenceTokens) private List<ChunkTaggedToken> getTokensWithTokenReadings(List<org.languagetool.AnalyzedTokenReadings> tokenReadings, String[] tokens, String[] chunkTags) private String[](package private) String[]
-
Field Details
-
TOKENIZER_MODEL
- See Also:
-
POS_TAGGER_MODEL
- See Also:
-
CHUNKER_MODEL
- See Also:
-
tokenModel
private static volatile opennlp.tools.tokenize.TokenizerModel tokenModelThis needs to be static to save memory: as Language.LANGUAGES is static, any language that is once created there will never be released. As English has several variants, we'd have as many posModels etc. as we have variants -> huge waste of memory: -
posModel
private static volatile opennlp.tools.postag.POSModel posModel -
chunkerModel
private static volatile opennlp.tools.chunker.ChunkerModel chunkerModel -
chunkFilter
-
-
Constructor Details
-
EnglishChunker
public EnglishChunker()
-
-
Method Details
-
addChunkTags
- Specified by:
addChunkTagsin interfaceorg.languagetool.chunking.Chunker
-
getChunkTagsForReadings
private List<ChunkTaggedToken> getChunkTagsForReadings(List<org.languagetool.AnalyzedTokenReadings> tokenReadings) -
tokenize
-
posTag
-
chunk
-
getTokensWithTokenReadings
private List<ChunkTaggedToken> getTokensWithTokenReadings(List<org.languagetool.AnalyzedTokenReadings> tokenReadings, String[] tokens, String[] chunkTags) -
assignChunksToReadings
-
getSentence
-
getAnalyzedTokenReadingsFor
@Nullable private @Nullable org.languagetool.AnalyzedTokenReadings getAnalyzedTokenReadingsFor(int startPos, int endPos, List<org.languagetool.AnalyzedTokenReadings> tokenReadings)
-