Class KeepLargestBlockFilter
java.lang.Object
com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- All Implemented Interfaces:
BoilerpipeFilter
Keeps the largest
TextBlock only (by the number of words). In case of more than one block
with the same number of words, the first block is chosen. All discarded blocks are marked
"not content" and flagged as DefaultLabels.MIGHT_BE_CONTENT.
Note that, by default, only TextBlocks marked as "content" are taken into consideration.-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final booleanstatic final KeepLargestBlockFilterstatic final KeepLargestBlockFilterstatic final KeepLargestBlockFilterprivate final int -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionbooleanprocess(TextDocument doc) Processes the given documentdoc.
-
Field Details
-
INSTANCE
-
INSTANCE_EXPAND_TO_SAME_TAGLEVEL
-
INSTANCE_EXPAND_TO_SAME_TAGLEVEL_MIN_WORDS
-
expandToSameLevelText
private final boolean expandToSameLevelText -
minWords
private final int minWords
-
-
Constructor Details
-
KeepLargestBlockFilter
public KeepLargestBlockFilter(boolean expandToSameLevelText, int minWords)
-
-
Method Details
-
process
Description copied from interface:BoilerpipeFilterProcesses the given documentdoc.- Specified by:
processin interfaceBoilerpipeFilter- Parameters:
doc- TheTextDocumentthat is to be processed.- Returns:
trueif changes have been made to theTextDocument.- Throws:
BoilerpipeProcessingException
-