Package com.kohlschutter.boilerpipe.filters.simple
These BoilerpipeFilters are straight-forward and probably not really specific to English.
-
Class Summary Class Description BoilerplateBlockFilter RemovesTextBlocks which have explicitly been marked as "not content".InvertedFilter Reverts the "isContent" flag for allTextBlocksLabelToBoilerplateFilter Marks all blocks that contain a given label as "boilerplate".LabelToContentFilter Marks all blocks that contain a given label as "content".MarkEverythingBoilerplateFilter Marks all blocks as boilerplate.MarkEverythingContentFilter Marks all blocks as content.MinClauseWordsFilter Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5).MinWordsFilter Keeps only those content blocks which contain at least k words.SplitParagraphBlocksFilter Splits TextBlocks at paragraph boundaries.SurroundingToContentFilter Marks blocks as "content" if their preceding and following blocks are both already marked "content", and the givenTextBlockConditionis met.