Class MinClauseWordsFilter
- java.lang.Object
-
- com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
-
- All Implemented Interfaces:
BoilerpipeFilter
public final class MinClauseWordsFilter extends java.lang.Object implements BoilerpipeFilter
Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5). NOTE: You might consider using theSplitParagraphBlocksFilterupstream.- See Also:
SplitParagraphBlocksFilter
-
-
Field Summary
Fields Modifier and Type Field Description private booleanacceptClausesWithoutDelimiterstatic MinClauseWordsFilterINSTANCEprivate intminWordsprivate java.util.regex.PatternPAT_CLAUSE_DELIMITERprivate java.util.regex.PatternPAT_WHITESPACE
-
Constructor Summary
Constructors Constructor Description MinClauseWordsFilter(int minWords)MinClauseWordsFilter(int minWords, boolean acceptClausesWithoutDelimiter)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private booleanisClause(java.lang.CharSequence text)booleanprocess(TextDocument doc)Processes the given documentdoc.
-
-
-
Field Detail
-
INSTANCE
public static final MinClauseWordsFilter INSTANCE
-
minWords
private int minWords
-
acceptClausesWithoutDelimiter
private final boolean acceptClausesWithoutDelimiter
-
PAT_CLAUSE_DELIMITER
private final java.util.regex.Pattern PAT_CLAUSE_DELIMITER
-
PAT_WHITESPACE
private final java.util.regex.Pattern PAT_WHITESPACE
-
-
Method Detail
-
process
public boolean process(TextDocument doc) throws BoilerpipeProcessingException
Description copied from interface:BoilerpipeFilterProcesses the given documentdoc.- Specified by:
processin interfaceBoilerpipeFilter- Parameters:
doc- TheTextDocumentthat is to be processed.- Returns:
trueif changes have been made to theTextDocument.- Throws:
BoilerpipeProcessingException
-
isClause
private boolean isClause(java.lang.CharSequence text)
-
-