Interface BoilerpipeExtractor
- All Superinterfaces:
BoilerpipeFilter
- All Known Implementing Classes:
ArticleExtractor, ArticleSentencesExtractor, CanolaExtractor, DefaultExtractor, ExtractorBase, KeepEverythingExtractor, KeepEverythingWithMinKWordsExtractor, LargestContentExtractor, NumWordsRulesExtractor
Describes a complete filter pipeline.
-
Method Summary
Modifier and TypeMethodDescriptiongetText(TextDocument doc) Extracts text from the givenTextDocumentobject.Extracts text from the HTML code available from the givenReader.Extracts text from the HTML code given as a String.getText(InputSource is) Extracts text from the HTML code available from the givenInputSource.Methods inherited from interface BoilerpipeFilter
process
-
Method Details
-
getText
Extracts text from the HTML code given as a String.- Parameters:
html- The HTML code as a String.- Returns:
- The extracted text.
- Throws:
BoilerpipeProcessingException
-
getText
Extracts text from the HTML code available from the givenInputSource.- Parameters:
is- The InputSource containing the HTML- Returns:
- The extracted text.
- Throws:
BoilerpipeProcessingException
-
getText
Extracts text from the HTML code available from the givenReader.- Parameters:
r- The Reader containing the HTML- Returns:
- The extracted text.
- Throws:
BoilerpipeProcessingException
-
getText
Extracts text from the givenTextDocumentobject.- Parameters:
doc- TheTextDocument.- Returns:
- The extracted text.
- Throws:
BoilerpipeProcessingException
-