Class SimpleWordTokenizer


  • public class SimpleWordTokenizer
    extends java.lang.Object
    This is a small and fast word tokenizer. It has different characteristics from the normal Java tokenizer. It only considers clear words that are only ended with spaces as strings. EX: "Flight" would be a word but "Flight()" would not.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static char[] BREAKERS  
      private static java.util.regex.Pattern NONBREAKERS  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      private static int getStart​(java.lang.String string)
      Go through the list of BREAKERS and find the closes one.
      private static boolean isBreaker​(char c)
      Return true if the given char is considered a breaker.
      static java.util.List<StringEntry> tokenize​(java.lang.String line)
      Breaks the given line into multiple tokens.
      private static java.util.List<StringEntry> tokenize​(java.lang.String line, int start)
      Internal impl.
      static java.util.List<StringEntry> tokenize​(java.lang.String line, java.lang.String find)
      Tokenize the given line but only return those tokens that match the parameter find.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • NONBREAKERS

        private static final java.util.regex.Pattern NONBREAKERS
      • BREAKERS

        private static final char[] BREAKERS
    • Constructor Detail

      • SimpleWordTokenizer

        public SimpleWordTokenizer()
    • Method Detail

      • tokenize

        public static java.util.List<StringEntry> tokenize​(java.lang.String line)
        Breaks the given line into multiple tokens.
        Parameters:
        line - line to tokenize
        Returns:
        list of tokens
      • tokenize

        public static java.util.List<StringEntry> tokenize​(java.lang.String line,
                                                           java.lang.String find)
        Tokenize the given line but only return those tokens that match the parameter find.
        Parameters:
        line - line to search in
        find - String to match
        Returns:
        list of matching tokens
      • tokenize

        private static java.util.List<StringEntry> tokenize​(java.lang.String line,
                                                            int start)
        Internal impl. Specify the start and end.
      • getStart

        private static int getStart​(java.lang.String string)
        Go through the list of BREAKERS and find the closes one.
      • isBreaker

        private static boolean isBreaker​(char c)
        Return true if the given char is considered a breaker.