Package com.kohlschutter.boilerpipe.util
Class UnicodeTokenizer
- java.lang.Object
-
- com.kohlschutter.boilerpipe.util.UnicodeTokenizer
-
public class UnicodeTokenizer extends java.lang.ObjectTokenizes text according to Unicode word boundaries and strips off non-word characters.
-
-
Field Summary
Fields Modifier and Type Field Description private static java.util.regex.PatternPAT_NOT_WORD_BOUNDARYprivate static java.util.regex.PatternPAT_WORD_BOUNDARY
-
Constructor Summary
Constructors Constructor Description UnicodeTokenizer()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.lang.String[]tokenize(java.lang.CharSequence text)Tokenizes the text and returns an array of tokens.
-