Class KeepLargestBlockFilter
- java.lang.Object
-
- com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
-
- All Implemented Interfaces:
BoilerpipeFilter
public final class KeepLargestBlockFilter extends java.lang.Object implements BoilerpipeFilter
Keeps the largestTextBlockonly (by the number of words). In case of more than one block with the same number of words, the first block is chosen. All discarded blocks are marked "not content" and flagged asDefaultLabels.MIGHT_BE_CONTENT. Note that, by default, only TextBlocks marked as "content" are taken into consideration.
-
-
Field Summary
Fields Modifier and Type Field Description private booleanexpandToSameLevelTextstatic KeepLargestBlockFilterINSTANCEstatic KeepLargestBlockFilterINSTANCE_EXPAND_TO_SAME_TAGLEVELstatic KeepLargestBlockFilterINSTANCE_EXPAND_TO_SAME_TAGLEVEL_MIN_WORDSprivate intminWords
-
Constructor Summary
Constructors Constructor Description KeepLargestBlockFilter(boolean expandToSameLevelText, int minWords)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanprocess(TextDocument doc)Processes the given documentdoc.
-
-
-
Field Detail
-
INSTANCE
public static final KeepLargestBlockFilter INSTANCE
-
INSTANCE_EXPAND_TO_SAME_TAGLEVEL
public static final KeepLargestBlockFilter INSTANCE_EXPAND_TO_SAME_TAGLEVEL
-
INSTANCE_EXPAND_TO_SAME_TAGLEVEL_MIN_WORDS
public static final KeepLargestBlockFilter INSTANCE_EXPAND_TO_SAME_TAGLEVEL_MIN_WORDS
-
expandToSameLevelText
private final boolean expandToSameLevelText
-
minWords
private final int minWords
-
-
Method Detail
-
process
public boolean process(TextDocument doc) throws BoilerpipeProcessingException
Description copied from interface:BoilerpipeFilterProcesses the given documentdoc.- Specified by:
processin interfaceBoilerpipeFilter- Parameters:
doc- TheTextDocumentthat is to be processed.- Returns:
trueif changes have been made to theTextDocument.- Throws:
BoilerpipeProcessingException
-
-