Class MultiWordChunker
- java.lang.Object
-
- org.languagetool.tagging.disambiguation.AbstractDisambiguator
-
- org.languagetool.tagging.disambiguation.MultiWordChunker
-
- All Implemented Interfaces:
Disambiguator
public class MultiWordChunker extends AbstractDisambiguator
Multiword tagger-chunker.
-
-
Field Summary
Fields Modifier and Type Field Description private booleanallowFirstCapitalizedprivate java.lang.Stringfilenameprivate java.util.Map<java.lang.String,java.lang.String>mFullprivate java.util.Map<java.lang.String,java.lang.Integer>mStartNoSpaceprivate java.util.Map<java.lang.String,java.lang.Integer>mStartSpace
-
Constructor Summary
Constructors Constructor Description MultiWordChunker(java.lang.String filename)MultiWordChunker(java.lang.String filename, boolean allowFirstCapitalized)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description AnalyzedSentencedisambiguate(AnalyzedSentence input)Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.private voidlazyInit()private java.util.List<java.lang.String>loadWords(java.io.InputStream stream)private AnalyzedTokenReadingsprepareNewReading(java.lang.String tokens, java.lang.String tok, AnalyzedTokenReadings token, boolean isLast)private AnalyzedTokenReadingssetAndAnnotate(AnalyzedTokenReadings oldReading, AnalyzedToken newReading)-
Methods inherited from class org.languagetool.tagging.disambiguation.AbstractDisambiguator
preDisambiguate
-
-
-
-
Field Detail
-
filename
private final java.lang.String filename
-
allowFirstCapitalized
private final boolean allowFirstCapitalized
-
mStartSpace
private java.util.Map<java.lang.String,java.lang.Integer> mStartSpace
-
mStartNoSpace
private java.util.Map<java.lang.String,java.lang.Integer> mStartNoSpace
-
mFull
private java.util.Map<java.lang.String,java.lang.String> mFull
-
-
Constructor Detail
-
MultiWordChunker
public MultiWordChunker(java.lang.String filename)
- Parameters:
filename- file text with multiwords and tags
-
MultiWordChunker
public MultiWordChunker(java.lang.String filename, boolean allowFirstCapitalized)- Parameters:
filename- file text with multiwords and tagsallowFirstCapitalized- if set totrue, first word of the multiword can be capitalized
-
-
Method Detail
-
lazyInit
private void lazyInit()
-
disambiguate
public final AnalyzedSentence disambiguate(AnalyzedSentence input)
Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.- Parameters:
input- The tokens to be chunked.- Returns:
- AnalyzedSentence with additional markers.
-
prepareNewReading
private AnalyzedTokenReadings prepareNewReading(java.lang.String tokens, java.lang.String tok, AnalyzedTokenReadings token, boolean isLast)
-
setAndAnnotate
private AnalyzedTokenReadings setAndAnnotate(AnalyzedTokenReadings oldReading, AnalyzedToken newReading)
-
loadWords
private java.util.List<java.lang.String> loadWords(java.io.InputStream stream)
-
-