Class TerminatingBlocksFinder

java.lang.Object
com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
All Implemented Interfaces:
BoilerpipeFilter

public class TerminatingBlocksFinder extends Object implements BoilerpipeFilter
Finds blocks which are potentially indicating the end of an article text and marks them with DefaultLabels.INDICATES_END_OF_TEXT. This can be used in conjunction with a downstream IgnoreBlocksAfterContentFilter.
See Also:
  • Field Details

  • Constructor Details

    • TerminatingBlocksFinder

      public TerminatingBlocksFinder()
  • Method Details

    • getInstance

      public static TerminatingBlocksFinder getInstance()
      Returns the singleton instance for TerminatingBlocksFinder.
    • process

      public boolean process(TextDocument doc) throws BoilerpipeProcessingException
      Description copied from interface: BoilerpipeFilter
      Processes the given document doc.
      Specified by:
      process in interface BoilerpipeFilter
      Parameters:
      doc - The TextDocument that is to be processed.
      Returns:
      true if changes have been made to the TextDocument.
      Throws:
      BoilerpipeProcessingException
    • startsWithNumber

      private static boolean startsWithNumber(String t, int len, String... str)
      Checks whether the given text t starts with a sequence of digits, followed by one of the given strings.
      Parameters:
      t - The text to examine
      len - The length of the text to examine
      str - Any strings that may follow the digits.
      Returns:
      true if at least one combination matches
    • isDigit

      private static boolean isDigit(char c)