Class NumWordsRulesClassifier

java.lang.Object
com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
All Implemented Interfaces:
BoilerpipeFilter

public class NumWordsRulesClassifier extends Object implements BoilerpipeFilter
Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block.