Class NumWordsRulesClassifier
java.lang.Object
com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
- All Implemented Interfaces:
BoilerpipeFilter
Classifies
TextBlocks as content/not-content through rules that have been determined
using the C4.8 machine learning algorithm, as described in the paper
"Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of
words per block and link density per block.-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected booleanstatic NumWordsRulesClassifierReturns the singleton instance for RulebasedBoilerpipeClassifier.booleanprocess(TextDocument doc) Processes the given documentdoc.
-
Field Details
-
INSTANCE
-
-
Constructor Details
-
NumWordsRulesClassifier
public NumWordsRulesClassifier()
-
-
Method Details
-
getInstance
Returns the singleton instance for RulebasedBoilerpipeClassifier. -
process
Description copied from interface:BoilerpipeFilterProcesses the given documentdoc.- Specified by:
processin interfaceBoilerpipeFilter- Parameters:
doc- TheTextDocumentthat is to be processed.- Returns:
trueif changes have been made to theTextDocument.- Throws:
BoilerpipeProcessingException
-
classify
-