Class ImageExtractor
java.lang.Object
com.kohlschutter.boilerpipe.sax.ImageExtractor
Extracts the images that are enclosed by extracted content.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate final classprivate static class -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final ImageExtractorprivate static final ImageExtractor.TagActionprivate static Map<String, ImageExtractor.TagAction> -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic ImageExtractorReturns the singleton instance ofImageExtractor.process(TextDocument doc, String origHTML) Processes the givenTextDocumentand the original HTML text (as a String).process(TextDocument doc, InputSource is) Processes the givenTextDocumentand the original HTML text (as anInputSource).process(URL url, BoilerpipeExtractor extractor) Fetches the givenURLusingHTMLFetcherand processes the retrieved HTML using the specifiedBoilerpipeExtractor.
-
Field Details
-
INSTANCE
-
TA_IGNORABLE_ELEMENT
-
TAG_ACTIONS
-
-
Constructor Details
-
ImageExtractor
private ImageExtractor()
-
-
Method Details
-
getInstance
Returns the singleton instance ofImageExtractor.- Returns:
-
process
Processes the givenTextDocumentand the original HTML text (as a String).- Parameters:
doc- The processedTextDocument.origHTML- The original HTML document.- Returns:
- A List of enclosed
Images - Throws:
BoilerpipeProcessingException
-
process
Processes the givenTextDocumentand the original HTML text (as anInputSource).- Parameters:
doc- The processedTextDocument.origHTML- The original HTML document.- Returns:
- A List of enclosed
Images - Throws:
BoilerpipeProcessingException
-
process
public List<Image> process(URL url, BoilerpipeExtractor extractor) throws IOException, BoilerpipeProcessingException, SAXException Fetches the givenURLusingHTMLFetcherand processes the retrieved HTML using the specifiedBoilerpipeExtractor.- Parameters:
doc- The processedTextDocument.is- The original HTML document.- Returns:
- A List of enclosed
Images - Throws:
BoilerpipeProcessingExceptionIOExceptionSAXException
-