Class MultiWordChunker2
java.lang.Object
org.languagetool.tagging.disambiguation.AbstractDisambiguator
org.languagetool.tagging.disambiguation.MultiWordChunker2
- All Implemented Interfaces:
Disambiguator
Multiword tagger-chunker.
Note: currently does not support:
- overlapping tagging (first matching multiword entry wins)
-
Nested Class Summary
Nested Classes -
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionMultiWordChunker2(String filename) MultiWordChunker2(String filename, boolean allowFirstCapitalized) -
Method Summary
Modifier and TypeMethodDescriptiondisambiguate(AnalyzedSentence input) Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.private MultiWordChunker2.MultiWordEntryfindMultiwordEntry(AnalyzedTokenReadings[] inputTokens, int startingPosition, List<MultiWordChunker2.MultiWordEntry> multiwordItems) protected StringformatPosTag(String posTag, int position, int multiwordLength) Override this method if you want format POS tag differentlyprivate booleanisMatching(AnalyzedTokenReadings[] inputTokens, int startingPosition, MultiWordChunker2.MultiWordEntry multiWordEntry) private voidlazyInit()loadWords(InputStream stream) protected booleanmatches(String matchText, AnalyzedTokenReadings inputTokens) protected AnalyzedTokenReadingsprepareNewReading(String tokens, String tok, AnalyzedTokenReadings token, String tag) private AnalyzedTokenReadingssetAndAnnotate(AnalyzedTokenReadings oldReading, AnalyzedToken newReading) voidsetRemoveOtherReadings(boolean removeOtherReadings) voidsetWrapTag(boolean wrapTag) Methods inherited from class org.languagetool.tagging.disambiguation.AbstractDisambiguator
preDisambiguate
-
Field Details
-
WRAP_TAG
- See Also:
-
filename
-
allowFirstCapitalized
private final boolean allowFirstCapitalized -
removeOtherReadings
private boolean removeOtherReadings -
tagFormat
-
tokenToPosTagMap
-
-
Constructor Details
-
MultiWordChunker2
- Parameters:
filename- file text with multiwords and tags
-
MultiWordChunker2
- Parameters:
filename- file text with multiwords and tagsallowFirstCapitalized- if set totrue, first word of the multiword can be capitalized
-
-
Method Details
-
setRemoveOtherReadings
public void setRemoveOtherReadings(boolean removeOtherReadings) - Parameters:
removeOtherReadings- If true and multiword matches other readings will be removed
-
setWrapTag
public void setWrapTag(boolean wrapTag) - Parameters:
wrapTag- If true the tag will be wrapped with < and >
-
formatPosTag
Override this method if you want format POS tag differently- Parameters:
posTag- POS tag for the multiwordposition- Position of the token in the multiword- Returns:
- Returns formatted POS tag for the multiword
-
lazyInit
private void lazyInit() -
disambiguate
Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.- Parameters:
input- The tokens to be chunked.- Returns:
- AnalyzedSentence with additional markers.
-
findMultiwordEntry
private MultiWordChunker2.MultiWordEntry findMultiwordEntry(AnalyzedTokenReadings[] inputTokens, int startingPosition, List<MultiWordChunker2.MultiWordEntry> multiwordItems) -
isMatching
private boolean isMatching(AnalyzedTokenReadings[] inputTokens, int startingPosition, MultiWordChunker2.MultiWordEntry multiWordEntry) -
matches
-
prepareNewReading
protected AnalyzedTokenReadings prepareNewReading(String tokens, String tok, AnalyzedTokenReadings token, String tag) -
setAndAnnotate
private AnalyzedTokenReadings setAndAnnotate(AnalyzedTokenReadings oldReading, AnalyzedToken newReading) -
loadWords
-