Package com.kohlschutter.boilerpipe.sax
Classes related to parsing and producing HTML from/to Boilerpipe TextDocuments.
-
Interface Summary Interface Description InputSourceable An InputSourceable can return an arbitrary number of newInputSources for a given document.TagAction Defines an action that is to be performed whenever a particular tag occurs during HTML parsing. -
Class Summary Class Description BoilerpipeHTMLContentHandler A simple SAXContentHandler, used byBoilerpipeSAXInput.BoilerpipeHTMLParser A simple SAX Parser, used byBoilerpipeSAXInput.BoilerpipeSAXInput Parses anInputSourceusing SAX and returns aTextDocument.CommonTagActions Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.CommonTagActions.BlockTagLabelAction CommonTagActionsfor block-level elements, which triggers someLabelActionon the generatedTextBlock.CommonTagActions.Chained CommonTagActions.InlineTagLabelAction DefaultTagActionMap DefaultTagActions.HTMLDocument AnInputSourceableforHTMLFetcher.HTMLFetcher A very simple HTTP/HTML fetcher, really just for demo purposes.HTMLHighlighter Highlights text blocks in an HTML document that have been marked as "content" in the correspondingTextDocument.HTMLHighlighter.TagAction ImageExtractor Extracts the images that are enclosed by extracted content.ImageExtractor.TagAction MarkupTagAction Assigns labels for element CSS classes and ids to the correspondingTextBlock.TagActionMap Base class for definition a set ofTagActions that are to be used for the HTML parsing process. -
Enum Summary Enum Description BoilerpipeHTMLContentHandler.Event