Package com.kohlschutter.boilerpipe.sax
Class ImageExtractor
- java.lang.Object
-
- com.kohlschutter.boilerpipe.sax.ImageExtractor
-
public final class ImageExtractor extends java.lang.ObjectExtracts the images that are enclosed by extracted content.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private classImageExtractor.Implementationprivate static classImageExtractor.TagAction
-
Field Summary
Fields Modifier and Type Field Description static ImageExtractorINSTANCEprivate static ImageExtractor.TagActionTA_IGNORABLE_ELEMENTprivate static java.util.Map<java.lang.String,ImageExtractor.TagAction>TAG_ACTIONS
-
Constructor Summary
Constructors Modifier Constructor Description privateImageExtractor()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static ImageExtractorgetInstance()Returns the singleton instance ofImageExtractor.java.util.List<Image>process(TextDocument doc, java.lang.String origHTML)Processes the givenTextDocumentand the original HTML text (as a String).java.util.List<Image>process(TextDocument doc, org.xml.sax.InputSource is)Processes the givenTextDocumentand the original HTML text (as anInputSource).java.util.List<Image>process(java.net.URL url, BoilerpipeExtractor extractor)Fetches the givenURLusingHTMLFetcherand processes the retrieved HTML using the specifiedBoilerpipeExtractor.
-
-
-
Field Detail
-
INSTANCE
public static final ImageExtractor INSTANCE
-
TA_IGNORABLE_ELEMENT
private static final ImageExtractor.TagAction TA_IGNORABLE_ELEMENT
-
TAG_ACTIONS
private static java.util.Map<java.lang.String,ImageExtractor.TagAction> TAG_ACTIONS
-
-
Method Detail
-
getInstance
public static ImageExtractor getInstance()
Returns the singleton instance ofImageExtractor.- Returns:
-
process
public java.util.List<Image> process(TextDocument doc, java.lang.String origHTML) throws BoilerpipeProcessingException
Processes the givenTextDocumentand the original HTML text (as a String).- Parameters:
doc- The processedTextDocument.origHTML- The original HTML document.- Returns:
- A List of enclosed
Images - Throws:
BoilerpipeProcessingException
-
process
public java.util.List<Image> process(TextDocument doc, org.xml.sax.InputSource is) throws BoilerpipeProcessingException
Processes the givenTextDocumentand the original HTML text (as anInputSource).- Parameters:
doc- The processedTextDocument.origHTML- The original HTML document.- Returns:
- A List of enclosed
Images - Throws:
BoilerpipeProcessingException
-
process
public java.util.List<Image> process(java.net.URL url, BoilerpipeExtractor extractor) throws java.io.IOException, BoilerpipeProcessingException, org.xml.sax.SAXException
Fetches the givenURLusingHTMLFetcherand processes the retrieved HTML using the specifiedBoilerpipeExtractor.- Parameters:
doc- The processedTextDocument.is- The original HTML document.- Returns:
- A List of enclosed
Images - Throws:
BoilerpipeProcessingExceptionjava.io.IOExceptionorg.xml.sax.SAXException
-
-