Class CatalanWordTokenizer
java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.ca.CatalanWordTokenizer
- All Implemented Interfaces:
org.languagetool.tokenizers.Tokenizer
public class CatalanWordTokenizer
extends org.languagetool.tokenizers.WordTokenizer
Tokenizes a sentence into words. Punctuation and whitespace gets its own token.
Special treatment for hyphens and apostrophes in Catalan.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Stringprivate static final Patternprivate static final Patternprivate static final Patternprivate static final intprivate static final Patternprivate final Pattern[]private static final Stringprivate static final Patternprivate static final Patternprivate static final Patternprotected org.languagetool.rules.spelling.morfologik.MorfologikSpeller -
Constructor Summary
Constructors -
Method Summary
Methods inherited from class org.languagetool.tokenizers.WordTokenizer
getProtocols, getTokenizingCharacters, isEMail, isUrl, joinEMails, joinEMailsAndUrls, joinUrls
-
Field Details
-
PF
- See Also:
-
maxPatterns
private static final int maxPatterns- See Also:
-
patterns
-
DICT_FILENAME
- See Also:
-
speller
protected org.languagetool.rules.spelling.morfologik.MorfologikSpeller speller -
ELA_GEMINADA
-
ELA_GEMINADA_UPPERCASE
-
APOSTROF_RECTE
-
APOSTROF_RODO
-
APOSTROF_RECTE_1
-
APOSTROF_RODO_1
-
NEARBY_HYPHENS
-
HYPHENS
-
DECIMAL_POINT
-
DECIMAL_COMMA
-
SPACE_DIGITS0
-
SPACE_DIGITS
-
SPACE_DIGITS2
-
-
Constructor Details
-
CatalanWordTokenizer
public CatalanWordTokenizer()
-
-
Method Details
-
tokenize
- Specified by:
tokenizein interfaceorg.languagetool.tokenizers.Tokenizer- Overrides:
tokenizein classorg.languagetool.tokenizers.WordTokenizer- Parameters:
text- Text to tokenize- Returns:
- List of tokens. Note: a special string CA_APOS is used to replace apostrophes, and CA_HYPHEN to replace hyphens.
-
wordsToAdd
-