Class KeepLargestFulltextBlockFilter
- java.lang.Object
-
- com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
-
- com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
-
- All Implemented Interfaces:
BoilerpipeFilter
public final class KeepLargestFulltextBlockFilter extends HeuristicFilterBase implements BoilerpipeFilter
Keeps the largestTextBlockonly (by the number of words). In case of more than one block with the same number of words, the first block is chosen. All discarded blocks are marked "not content" and flagged asDefaultLabels.MIGHT_BE_CONTENT. As opposed toKeepLargestBlockFilter, the number of words are computed usingHeuristicFilterBase.getNumFullTextWords(TextBlock), which only counts words that occur in text elements with at least 9 words and are thus believed to be full text. NOTE: Without language-specific fine-tuning (i.e., running the default instance), this filter may lead to suboptimal results. You better useKeepLargestBlockFilterinstead, which works at the level of number-of-words instead of text densities.
-
-
Field Summary
Fields Modifier and Type Field Description static KeepLargestFulltextBlockFilterINSTANCE
-
Constructor Summary
Constructors Constructor Description KeepLargestFulltextBlockFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanprocess(TextDocument doc)Processes the given documentdoc.-
Methods inherited from class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
getNumFullTextWords, getNumFullTextWords
-
-
-
-
Field Detail
-
INSTANCE
public static final KeepLargestFulltextBlockFilter INSTANCE
-
-
Method Detail
-
process
public boolean process(TextDocument doc) throws BoilerpipeProcessingException
Description copied from interface:BoilerpipeFilterProcesses the given documentdoc.- Specified by:
processin interfaceBoilerpipeFilter- Parameters:
doc- TheTextDocumentthat is to be processed.- Returns:
trueif changes have been made to theTextDocument.- Throws:
BoilerpipeProcessingException
-
-