Uses of Package
com.kohlschutter.boilerpipe.filters.english
-
Packages that use com.kohlschutter.boilerpipe.filters.english Package Description com.kohlschutter.boilerpipe.filters.english These BoilerpipeFilters have only been tested on English text. -
Classes in com.kohlschutter.boilerpipe.filters.english used by com.kohlschutter.boilerpipe.filters.english Class Description DensityRulesClassifier ClassifiesTextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features", particularly using text densities and link densities.HeuristicFilterBase Base class for some heuristics that are used by boilerpipe filters.IgnoreBlocksAfterContentFilter Marks all blocks as "non-content" that occur after blocks that have been markedDefaultLabels.INDICATES_END_OF_TEXT.IgnoreBlocksAfterContentFromEndFilter Marks all blocks as "non-content" that occur after blocks that have been markedDefaultLabels.INDICATES_END_OF_TEXT, and after any content block.KeepLargestFulltextBlockFilter Keeps the largestTextBlockonly (by the number of words).MinFulltextWordsFilter Keeps only those content blocks which contain at least k full-text words (measured byHeuristicFilterBase.getNumFullTextWords(TextBlock)).NumWordsRulesClassifier ClassifiesTextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block.TerminatingBlocksFinder Finds blocks which are potentially indicating the end of an article text and marks them withDefaultLabels.INDICATES_END_OF_TEXT.