Package com.kohlschutter.boilerpipe.sax
package com.kohlschutter.boilerpipe.sax
Classes related to parsing and producing HTML from/to Boilerpipe TextDocuments.
-
ClassDescriptionA simple SAX
ContentHandler, used byBoilerpipeSAXInput.A simple SAX Parser, used byBoilerpipeSAXInput.Parses anInputSourceusing SAX and returns aTextDocument.Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.CommonTagActionsfor block-level elements, which triggers someLabelActionon the generatedTextBlock.DefaultTagActions.AnInputSourceableforHTMLFetcher.A very simple HTTP/HTML fetcher, really just for demo purposes.Highlights text blocks in an HTML document that have been marked as "content" in the correspondingTextDocument.Extracts the images that are enclosed by extracted content.An InputSourceable can return an arbitrary number of newInputSources for a given document.Assigns labels for element CSS classes and ids to the correspondingTextBlock.Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.Base class for definition a set ofTagActions that are to be used for the HTML parsing process.