Package org.languagetool.tokenizers.ca
Class CatalanWordTokenizer
- java.lang.Object
-
- org.languagetool.tokenizers.WordTokenizer
-
- org.languagetool.tokenizers.ca.CatalanWordTokenizer
-
- All Implemented Interfaces:
org.languagetool.tokenizers.Tokenizer
public class CatalanWordTokenizer extends org.languagetool.tokenizers.WordTokenizerTokenizes a sentence into words. Punctuation and whitespace gets its own token. Special treatment for hyphens and apostrophes in Catalan.
-
-
Field Summary
Fields Modifier and Type Field Description private static java.util.regex.PatternAPOSTROF_RECTEprivate static java.util.regex.PatternAPOSTROF_RECTE_1private static java.util.regex.PatternAPOSTROF_RODOprivate static java.util.regex.PatternAPOSTROF_RODO_1private static java.util.regex.PatternDECIMAL_COMMAprivate static java.util.regex.PatternDECIMAL_POINTprivate static java.lang.StringDICT_FILENAMEprivate static java.util.regex.PatternELA_GEMINADAprivate static java.util.regex.PatternELA_GEMINADA_UPPERCASEprivate static java.util.regex.PatternHYPHENSprivate static intmaxPatternsprivate static java.util.regex.PatternNEARBY_HYPHENSprivate java.util.regex.Pattern[]patternsprivate static java.lang.StringPFprivate static java.util.regex.PatternSPACE_DIGITSprivate static java.util.regex.PatternSPACE_DIGITS0private static java.util.regex.PatternSPACE_DIGITS2protected org.languagetool.rules.spelling.morfologik.MorfologikSpellerspeller
-
Constructor Summary
Constructors Constructor Description CatalanWordTokenizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.List<java.lang.String>tokenize(java.lang.String text)private java.util.List<java.lang.String>wordsToAdd(java.lang.String s)
-
-
-
Field Detail
-
PF
private static final java.lang.String PF
- See Also:
- Constant Field Values
-
maxPatterns
private static final int maxPatterns
- See Also:
- Constant Field Values
-
patterns
private final java.util.regex.Pattern[] patterns
-
DICT_FILENAME
private static final java.lang.String DICT_FILENAME
- See Also:
- Constant Field Values
-
speller
protected org.languagetool.rules.spelling.morfologik.MorfologikSpeller speller
-
ELA_GEMINADA
private static final java.util.regex.Pattern ELA_GEMINADA
-
ELA_GEMINADA_UPPERCASE
private static final java.util.regex.Pattern ELA_GEMINADA_UPPERCASE
-
APOSTROF_RECTE
private static final java.util.regex.Pattern APOSTROF_RECTE
-
APOSTROF_RODO
private static final java.util.regex.Pattern APOSTROF_RODO
-
APOSTROF_RECTE_1
private static final java.util.regex.Pattern APOSTROF_RECTE_1
-
APOSTROF_RODO_1
private static final java.util.regex.Pattern APOSTROF_RODO_1
-
NEARBY_HYPHENS
private static final java.util.regex.Pattern NEARBY_HYPHENS
-
HYPHENS
private static final java.util.regex.Pattern HYPHENS
-
DECIMAL_POINT
private static final java.util.regex.Pattern DECIMAL_POINT
-
DECIMAL_COMMA
private static final java.util.regex.Pattern DECIMAL_COMMA
-
SPACE_DIGITS0
private static final java.util.regex.Pattern SPACE_DIGITS0
-
SPACE_DIGITS
private static final java.util.regex.Pattern SPACE_DIGITS
-
SPACE_DIGITS2
private static final java.util.regex.Pattern SPACE_DIGITS2
-
-
Method Detail
-
tokenize
public java.util.List<java.lang.String> tokenize(java.lang.String text)
- Specified by:
tokenizein interfaceorg.languagetool.tokenizers.Tokenizer- Overrides:
tokenizein classorg.languagetool.tokenizers.WordTokenizer- Parameters:
text- Text to tokenize- Returns:
- List of tokens. Note: a special string CA_APOS is used to replace apostrophes, and CA_HYPHEN to replace hyphens.
-
wordsToAdd
private java.util.List<java.lang.String> wordsToAdd(java.lang.String s)
-
-