Class DocumentTitleMatchClassifier
java.lang.Object
com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- All Implemented Interfaces:
BoilerpipeFilter
Marks
TextBlocks which contain parts of the HTML <TITLE> tag, using
some heuristics which are quite specific to the news domain.-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate voidaddPotentialTitles(Set<String> potentialTitles, String title, String pattern, int minWords) private StringgetLongestPart(String title, String pattern) booleanprocess(TextDocument doc) Processes the given documentdoc.
-
Field Details
-
potentialTitles
-
PAT_REMOVE_CHARACTERS
-
-
Constructor Details
-
DocumentTitleMatchClassifier
-
-
Method Details
-
getPotentialTitles
-
addPotentialTitles
-
getLongestPart
-
process
Description copied from interface:BoilerpipeFilterProcesses the given documentdoc.- Specified by:
processin interfaceBoilerpipeFilter- Parameters:
doc- TheTextDocumentthat is to be processed.- Returns:
trueif changes have been made to theTextDocument.- Throws:
BoilerpipeProcessingException
-