Class IgnoreBlocksAfterContentFilter
java.lang.Object
com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
- All Implemented Interfaces:
BoilerpipeFilter
public final class IgnoreBlocksAfterContentFilter
extends HeuristicFilterBase
implements BoilerpipeFilter
Marks all blocks as "non-content" that occur after blocks that have been marked
DefaultLabels.INDICATES_END_OF_TEXT. These marks are ignored unless a minimum number of
words in content blocks occur before this mark (default: 60). This can be used in conjunction
with an upstream TerminatingBlocksFinder.- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final IgnoreBlocksAfterContentFilterstatic final IgnoreBlocksAfterContentFilterprivate final int -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionReturns the singleton instance for DeleteBlocksAfterContentFilter.booleanprocess(TextDocument doc) Processes the given documentdoc.Methods inherited from class HeuristicFilterBase
getNumFullTextWords, getNumFullTextWords
-
Field Details
-
DEFAULT_INSTANCE
-
INSTANCE_200
-
minNumWords
private final int minNumWords
-
-
Constructor Details
-
IgnoreBlocksAfterContentFilter
public IgnoreBlocksAfterContentFilter(int minNumWords)
-
-
Method Details
-
getDefaultInstance
Returns the singleton instance for DeleteBlocksAfterContentFilter. -
process
Description copied from interface:BoilerpipeFilterProcesses the given documentdoc.- Specified by:
processin interfaceBoilerpipeFilter- Parameters:
doc- TheTextDocumentthat is to be processed.- Returns:
trueif changes have been made to theTextDocument.- Throws:
BoilerpipeProcessingException
-