Class CommonExtractors
java.lang.Object
com.kohlschutter.boilerpipe.extractors.CommonExtractors
Provides quick access to common
BoilerpipeExtractors.-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final ArticleExtractorWorks very well for most types of Article-like HTML.static final CanolaExtractorTrained on krdwrd Canola (different definition of "boilerplate").static final DefaultExtractorUsually worse thanArticleExtractor, but simpler/no heuristics.static final KeepEverythingExtractorDummy Extractor; should return the input text.static final LargestContentExtractorLikeDefaultExtractor, but keeps the largest text block only. -
Constructor Summary
Constructors -
Method Summary
-
Field Details
-
ARTICLE_EXTRACTOR
Works very well for most types of Article-like HTML. -
DEFAULT_EXTRACTOR
Usually worse thanArticleExtractor, but simpler/no heuristics. -
LARGEST_CONTENT_EXTRACTOR
LikeDefaultExtractor, but keeps the largest text block only. -
CANOLA_EXTRACTOR
Trained on krdwrd Canola (different definition of "boilerplate"). You may give it a try. -
KEEP_EVERYTHING_EXTRACTOR
Dummy Extractor; should return the input text. Use this to double-check that your problem is within a particularBoilerpipeExtractor, or somewhere else.
-
-
Constructor Details
-
CommonExtractors
private CommonExtractors()
-