Class PortugueseWordTokenizer
java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.pt.PortugueseWordTokenizer
- All Implemented Interfaces:
org.languagetool.tokenizers.Tokenizer
public class PortugueseWordTokenizer
extends org.languagetool.tokenizers.WordTokenizer
Tokenizes a sentence into words. Punctuation and whitespace gets its own token.
- Since:
- 3.6
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Patternprivate static final Stringprivate static final Patternprivate static final Stringprivate static final Patternprivate static final Stringprivate static final charprivate static final Patternprivate static final Patternprivate static final Stringprivate static final Patternprivate static final Stringprivate static final charprivate static final charprivate static final charprivate static final String -
Constructor Summary
Constructors -
Method Summary
Methods inherited from class org.languagetool.tokenizers.WordTokenizer
getProtocols, getTokenizingCharacters, isEMail, isUrl, joinEMails, joinEMailsAndUrls, joinUrls
-
Field Details
-
SPLIT_CHARS
- See Also:
-
DECIMAL_COMMA_SUBST
private static final char DECIMAL_COMMA_SUBST- See Also:
-
NON_BREAKING_SPACE_SUBST
private static final char NON_BREAKING_SPACE_SUBST- See Also:
-
NON_BREAKING_DOT_SUBST
private static final char NON_BREAKING_DOT_SUBST- See Also:
-
NON_BREAKING_COLON_SUBST
private static final char NON_BREAKING_COLON_SUBST- See Also:
-
DECIMAL_COMMA_PATTERN
-
DECIMAL_COMMA_REPL
- See Also:
-
DECIMAL_SPACE_PATTERN
-
DOTTED_NUMBERS_PATTERN
-
DOTTED_NUMBERS_REPL
- See Also:
-
COLON_NUMBERS_PATTERN
-
COLON_NUMBERS_REPL
- See Also:
-
DATE_PATTERN
-
DATE_PATTERN_REPL
- See Also:
-
DOTTED_ORDINALS_PATTERN
-
DOTTED_ORDINALS_REPL
- See Also:
-
-
Constructor Details
-
PortugueseWordTokenizer
public PortugueseWordTokenizer()
-
-
Method Details
-
tokenize
-