Class CommonExtractors

java.lang.Object
com.kohlschutter.boilerpipe.extractors.CommonExtractors

public final class CommonExtractors extends Object
Provides quick access to common BoilerpipeExtractors.
  • Field Details

    • ARTICLE_EXTRACTOR

      public static final ArticleExtractor ARTICLE_EXTRACTOR
      Works very well for most types of Article-like HTML.
    • DEFAULT_EXTRACTOR

      public static final DefaultExtractor DEFAULT_EXTRACTOR
      Usually worse than ArticleExtractor, but simpler/no heuristics.
    • LARGEST_CONTENT_EXTRACTOR

      public static final LargestContentExtractor LARGEST_CONTENT_EXTRACTOR
      Like DefaultExtractor, but keeps the largest text block only.
    • CANOLA_EXTRACTOR

      public static final CanolaExtractor CANOLA_EXTRACTOR
      Trained on krdwrd Canola (different definition of "boilerplate"). You may give it a try.
    • KEEP_EVERYTHING_EXTRACTOR

      public static final KeepEverythingExtractor KEEP_EVERYTHING_EXTRACTOR
      Dummy Extractor; should return the input text. Use this to double-check that your problem is within a particular BoilerpipeExtractor, or somewhere else.
  • Constructor Details

    • CommonExtractors

      private CommonExtractors()