Class SimpleWordTokenizer

java.lang.Object
org.apache.maven.jxr.util.SimpleWordTokenizer

public class SimpleWordTokenizer extends Object
This is a small and fast word tokenizer. It has different characteristics from the normal Java tokenizer. It only considers clear words that are only ended with spaces as strings. EX: "Flight" would be a word but "Flight()" would not.
  • Field Details

    • NONBREAKERS

      private static final Pattern NONBREAKERS
    • BREAKERS

      private static final char[] BREAKERS
  • Constructor Details

    • SimpleWordTokenizer

      public SimpleWordTokenizer()
  • Method Details

    • tokenize

      public static List<StringEntry> tokenize(String line)
      Breaks the given line into multiple tokens.
      Parameters:
      line - line to tokenize
      Returns:
      list of tokens
    • tokenize

      public static List<StringEntry> tokenize(String line, String find)
      Tokenize the given line but only return those tokens that match the parameter find.
      Parameters:
      line - line to search in
      find - String to match
      Returns:
      list of matching tokens
    • tokenize

      private static List<StringEntry> tokenize(String line, int start)
      Internal impl. Specify the start and end.
    • getStart

      private static int getStart(String string)
      Go through the list of BREAKERS and find the closes one.
    • isBreaker

      private static boolean isBreaker(char c)
      Return true if the given char is considered a breaker.