A B C D E F G H I K L M N O P Q R S T U V W X 
All Classes All Packages

A

A - Static variable in class org.cyberneko.html.HTMLElements
 
ABBR - Static variable in class org.cyberneko.html.HTMLElements
 
acceptClausesWithoutDelimiter - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
ACRONYM - Static variable in class org.cyberneko.html.HTMLElements
 
action - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
action - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
addElement(HTMLElements.Element) - Method in class org.cyberneko.html.HTMLElements.ElementList
Adds an element to list, resizing if necessary.
addLabel(String) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
Adds an arbitrary String label to this TextBlock.
addLabelAction(LabelAction) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
addLabels(String...) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
Adds a set of labels to this TextBlock.
addLabels(Set<String>) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
Adds a set of labels to this TextBlock.
addLabelsTo(TextBlock) - Method in class com.kohlschutter.boilerpipe.labels.LabelAction
 
addPotentialTitles(Set<String>, String, String, int) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
AddPrecedingLabelsFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Adds the labels of the preceding block to the current block, optionally adding a prefix.
AddPrecedingLabelsFilter(String) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
Creates a new AddPrecedingLabelsFilter instance.
ADDRESS - Static variable in class org.cyberneko.html.HTMLElements
 
addTagAction(String, TagAction) - Method in class com.kohlschutter.boilerpipe.sax.TagActionMap
Adds a particular TagAction for a given tag.
addTextBlock(TextBlock) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
addTo(TextBlock) - Method in class com.kohlschutter.boilerpipe.labels.ConditionalLabelAction
 
addTo(TextBlock) - Method in class com.kohlschutter.boilerpipe.labels.LabelAction
 
addWhitespaceIfNecessary() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
afterEnd(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
 
afterEnd(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
 
afterStart(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
 
afterStart(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
 
alt - Variable in class com.kohlschutter.boilerpipe.document.Image
 
ANCHOR_TEXT_END - Static variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
ANCHOR_TEXT_START - Static variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
APPLET - Static variable in class org.cyberneko.html.HTMLElements
 
area - Variable in class com.kohlschutter.boilerpipe.document.Image
 
AREA - Static variable in class org.cyberneko.html.HTMLElements
 
ARTICLE_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
Works very well for most types of Article-like HTML.
ARTICLE_METADATA - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
ArticleExtractor - Class in com.kohlschutter.boilerpipe.extractors
A full-text extractor which is tuned towards news articles.
ArticleExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
 
ArticleMetadataFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Tries to find TextBlocks that comprise of "article metadata".
ArticleMetadataFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
 
ArticleSentencesExtractor - Class in com.kohlschutter.boilerpipe.extractors
A full-text extractor which is tuned towards extracting sentences from news articles.
ArticleSentencesExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
 
attributes - Variable in class org.cyberneko.html.HTMLTagBalancer.Info
The element attributes.
AUGMENTATIONS - Static variable in class org.cyberneko.html.HTMLTagBalancer
Include infoset augmentations.
augs_ - Variable in class org.cyberneko.html.HTMLTagBalancer.ElementEntry
 
avgNumWords() - Method in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
Returns the average number of words at block-level (= overall number of words divided by the number of blocks).

B

B - Static variable in class org.cyberneko.html.HTMLElements
 
BASE - Static variable in class org.cyberneko.html.HTMLElements
 
BASEFONT - Static variable in class org.cyberneko.html.HTMLElements
 
BDO - Static variable in class org.cyberneko.html.HTMLElements
 
beforeEnd(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
 
beforeEnd(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
 
beforeStart(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
 
beforeStart(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
 
BGSOUND - Static variable in class org.cyberneko.html.HTMLElements
 
BIG - Static variable in class org.cyberneko.html.HTMLElements
 
BLINK - Static variable in class org.cyberneko.html.HTMLElements
 
BLOCK - Static variable in class org.cyberneko.html.HTMLElements.Element
Block element.
BlockProximityFusion - Class in com.kohlschutter.boilerpipe.filters.heuristics
Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.
BlockProximityFusion(int, boolean, boolean) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
Creates a new BlockProximityFusion instance.
BLOCKQUOTE - Static variable in class org.cyberneko.html.HTMLElements
 
BlockTagLabelAction(LabelAction) - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
blockTagLevel - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
BODY - Static variable in class org.cyberneko.html.HTMLElements
 
BoilerpipeDocumentSource - Interface in com.kohlschutter.boilerpipe
Something that can be represented as a TextDocument.
BoilerpipeExtractor - Interface in com.kohlschutter.boilerpipe
Describes a complete filter pipeline.
BoilerpipeFilter - Interface in com.kohlschutter.boilerpipe
A generic BoilerpipeFilter.
BoilerpipeHTMLContentHandler - Class in com.kohlschutter.boilerpipe.sax
A simple SAX ContentHandler, used by BoilerpipeSAXInput.
BoilerpipeHTMLContentHandler() - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
BoilerpipeHTMLContentHandler(TagActionMap) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
Constructs a BoilerpipeHTMLContentHandler using the given TagActionMap.
BoilerpipeHTMLContentHandler.Event - Enum in com.kohlschutter.boilerpipe.sax
 
BoilerpipeHTMLParser - Class in com.kohlschutter.boilerpipe.sax
A simple SAX Parser, used by BoilerpipeSAXInput.
BoilerpipeHTMLParser() - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
Constructs a BoilerpipeHTMLParser using a default HTML content handler.
BoilerpipeHTMLParser(boolean) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
 
BoilerpipeHTMLParser(BoilerpipeHTMLContentHandler) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
BoilerpipeInput - Interface in com.kohlschutter.boilerpipe
A source that returns TextDocuments.
BoilerpipeProcessingException - Exception in com.kohlschutter.boilerpipe
Exception for signaling failure in the processing pipeline.
BoilerpipeProcessingException() - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeProcessingException(String) - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeProcessingException(String, Throwable) - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeProcessingException(Throwable) - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeSAXInput - Class in com.kohlschutter.boilerpipe.sax
Parses an InputSource using SAX and returns a TextDocument.
BoilerpipeSAXInput(InputSource) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
Creates a new instance of BoilerpipeSAXInput for the given InputSource.
BoilerplateBlockFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Removes TextBlocks which have explicitly been marked as "not content".
BoilerplateBlockFilter(String) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
 
bounds - Variable in class org.cyberneko.html.HTMLElements.Element
The bounding element code.
BR - Static variable in class org.cyberneko.html.HTMLElements
 
BUTTON - Static variable in class org.cyberneko.html.HTMLElements
 

C

callEndElement(QName, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Call document handler end element.
callStartElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Call document handler start element.
CANOLA_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
Trained on krdwrd Canola (different definition of "boilerplate").
CanolaExtractor - Class in com.kohlschutter.boilerpipe.extractors
A full-text extractor trained on krdwrd Canola .
CanolaExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
 
CAPTION - Static variable in class org.cyberneko.html.HTMLElements
 
CENTER - Static variable in class org.cyberneko.html.HTMLElements
 
Chained(TagAction, TagAction) - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
 
changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
 
changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
changesTagLevel() - Method in interface com.kohlschutter.boilerpipe.sax.TagAction
 
characterElementIdx - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
characterElementIdx - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
characters(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
characters(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
characters(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
characters(XMLString, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Characters.
CHARACTERS - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
 
charset - Variable in class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
CITE - Static variable in class org.cyberneko.html.HTMLElements
 
CLASSIFIER - Static variable in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
The actual classifier, exposed.
classify(TextBlock, TextBlock, TextBlock) - Method in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
 
classify(TextBlock, TextBlock, TextBlock) - Method in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
 
clone() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
clone() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
 
closes - Variable in class org.cyberneko.html.HTMLElements.Element
List of elements this element can close.
closes(short) - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if this element can close the specified Element.
code - Variable in class org.cyberneko.html.HTMLElements.Element
The element code.
CODE - Static variable in class org.cyberneko.html.HTMLElements
 
COL - Static variable in class org.cyberneko.html.HTMLElements
 
COLGROUP - Static variable in class org.cyberneko.html.HTMLElements
 
com.kohlschutter.boilerpipe - package com.kohlschutter.boilerpipe
The Boilerpipe top-level package.
com.kohlschutter.boilerpipe.conditions - package com.kohlschutter.boilerpipe.conditions
 
com.kohlschutter.boilerpipe.demo - package com.kohlschutter.boilerpipe.demo
Just some simple demo code.
com.kohlschutter.boilerpipe.document - package com.kohlschutter.boilerpipe.document
The Boilerpipe document model.
com.kohlschutter.boilerpipe.estimators - package com.kohlschutter.boilerpipe.estimators
 
com.kohlschutter.boilerpipe.extractors - package com.kohlschutter.boilerpipe.extractors
Some standard extractors (i.e., completely piped BoilerpipeFilters)
com.kohlschutter.boilerpipe.filters.debug - package com.kohlschutter.boilerpipe.filters.debug
 
com.kohlschutter.boilerpipe.filters.english - package com.kohlschutter.boilerpipe.filters.english
These BoilerpipeFilters have only been tested on English text.
com.kohlschutter.boilerpipe.filters.heuristics - package com.kohlschutter.boilerpipe.filters.heuristics
These BoilerpipeFilters are pure heuristics.
com.kohlschutter.boilerpipe.filters.simple - package com.kohlschutter.boilerpipe.filters.simple
These BoilerpipeFilters are straight-forward and probably not really specific to English.
com.kohlschutter.boilerpipe.labels - package com.kohlschutter.boilerpipe.labels
 
com.kohlschutter.boilerpipe.sax - package com.kohlschutter.boilerpipe.sax
Classes related to parsing and producing HTML from/to Boilerpipe TextDocuments.
com.kohlschutter.boilerpipe.util - package com.kohlschutter.boilerpipe.util
Some helper classes.
comment(XMLString, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Comment.
COMMENT - Static variable in class org.cyberneko.html.HTMLElements
 
CommonExtractors - Class in com.kohlschutter.boilerpipe.extractors
Provides quick access to common BoilerpipeExtractors.
CommonExtractors() - Constructor for class com.kohlschutter.boilerpipe.extractors.CommonExtractors
 
CommonTagActions - Class in com.kohlschutter.boilerpipe.sax
Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.
CommonTagActions() - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions
 
CommonTagActions.BlockTagLabelAction - Class in com.kohlschutter.boilerpipe.sax
CommonTagActions for block-level elements, which triggers some LabelAction on the generated TextBlock.
CommonTagActions.Chained - Class in com.kohlschutter.boilerpipe.sax
 
CommonTagActions.InlineTagLabelAction - Class in com.kohlschutter.boilerpipe.sax
CommonTagActions for inline elements, which triggers some LabelAction on the generated TextBlock.
compareTo(Image) - Method in class com.kohlschutter.boilerpipe.document.Image
 
cond - Variable in class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
 
condition - Variable in class com.kohlschutter.boilerpipe.labels.ConditionalLabelAction
 
ConditionalLabelAction - Class in com.kohlschutter.boilerpipe.labels
Adds labels to a TextBlock if the given criteria are met.
ConditionalLabelAction(TextBlockCondition, String...) - Constructor for class com.kohlschutter.boilerpipe.labels.ConditionalLabelAction
 
consumeBufferedEndElements() - Method in class org.cyberneko.html.HTMLTagBalancer
Consume elements that have been buffered, like that are first consumed at the end of document
consumeEarlyTextIfNeeded() - Method in class org.cyberneko.html.HTMLTagBalancer
 
containedTextElements - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
CONTAINER - Static variable in class org.cyberneko.html.HTMLElements.Element
Container element.
contentBitSet - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
contentBitSet - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
ContentFusion - Class in com.kohlschutter.boilerpipe.filters.heuristics
Merges two blocks using some heuristics.
ContentFusion() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ContentFusion
Creates a new ContentFusion instance.
contentHandler - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
 
contentOnly - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
createQName(String) - Method in class org.cyberneko.html.HTMLTagBalancer
 
currentContainedTextElements - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 

D

data - Variable in class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
data - Variable in class org.cyberneko.html.HTMLElements.ElementList
The data in the list.
data - Variable in class org.cyberneko.html.HTMLTagBalancer.InfoStack
The stack data.
DD - Static variable in class org.cyberneko.html.HTMLElements
 
debugString() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
Returns detailed debugging information about the contained TextBlocks.
DEFAULT_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
Usually worse than ArticleExtractor, but simpler/no heuristics.
DEFAULT_INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
DEFAULT_INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
 
DefaultExtractor - Class in com.kohlschutter.boilerpipe.extractors
A quite generic full-text extractor.
DefaultExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
 
DefaultLabels - Class in com.kohlschutter.boilerpipe.labels
Some pre-defined labels which can be used in conjunction with TextBlock.addLabel(String) and TextBlock.hasLabel(String).
DefaultLabels() - Constructor for class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
DefaultTagActionMap - Class in com.kohlschutter.boilerpipe.sax
Default TagActions.
DefaultTagActionMap() - Constructor for class com.kohlschutter.boilerpipe.sax.DefaultTagActionMap
 
DEL - Static variable in class org.cyberneko.html.HTMLElements
 
DensityRulesClassifier - Class in com.kohlschutter.boilerpipe.filters.english
Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features", particularly using text densities and link densities.
DensityRulesClassifier() - Constructor for class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
 
DFN - Static variable in class org.cyberneko.html.HTMLElements
 
DIR - Static variable in class org.cyberneko.html.HTMLElements
 
DIV - Static variable in class org.cyberneko.html.HTMLElements
 
DL - Static variable in class org.cyberneko.html.HTMLElements
 
doctypeDecl(String, String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Doctype declaration.
DOCUMENT_FRAGMENT - Static variable in class org.cyberneko.html.HTMLTagBalancer
Document fragment balancing only.
DOCUMENT_FRAGMENT_DEPRECATED - Static variable in class org.cyberneko.html.HTMLTagBalancer
Document fragment balancing only (deprecated).
DocumentTitleMatchClassifier - Class in com.kohlschutter.boilerpipe.filters.heuristics
Marks TextBlocks which contain parts of the HTML <TITLE> tag, using some heuristics which are quite specific to the news domain.
DocumentTitleMatchClassifier(String) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
DT - Static variable in class org.cyberneko.html.HTMLElements
 

E

element - Variable in class org.cyberneko.html.HTMLTagBalancer.Info
The element.
Element(short, String, int, short[], short[]) - Constructor for class org.cyberneko.html.HTMLElements.Element
Constructs an element object.
Element(short, String, int, short[], short, short[]) - Constructor for class org.cyberneko.html.HTMLElements.Element
Constructs an element object.
Element(short, String, int, short, short[]) - Constructor for class org.cyberneko.html.HTMLElements.Element
Constructs an element object.
Element(short, String, int, short, short, short[]) - Constructor for class org.cyberneko.html.HTMLElements.Element
Constructs an element object.
ElementEntry(QName, Augmentations) - Constructor for class org.cyberneko.html.HTMLTagBalancer.ElementEntry
 
ElementList() - Constructor for class org.cyberneko.html.HTMLElements.ElementList
 
ELEMENTS - Static variable in class org.cyberneko.html.HTMLElements
Element information as a contiguous list.
ELEMENTS_ARRAY - Static variable in class org.cyberneko.html.HTMLElements
Element information organized by first letter.
EM - Static variable in class org.cyberneko.html.HTMLElements
 
EMBED - Static variable in class org.cyberneko.html.HTMLElements
 
EMPTY - Static variable in class org.cyberneko.html.HTMLElements.Element
Empty element.
EMPTY_BITSET - Static variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
EMPTY_END - Static variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
EMPTY_START - Static variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
emptyAttributes() - Method in class org.cyberneko.html.HTMLTagBalancer
Returns a set of empty attributes.
emptyElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Empty element.
end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in interface com.kohlschutter.boilerpipe.sax.TagAction
 
END_TAG - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
 
endCDATA(Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
End CDATA section.
endDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
endDocument() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
endDocument() - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
endDocument(Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
End document.
endElement(String, String, String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
endElement(String, String, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
endElement(String, String, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
endElement(QName, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
End element.
endElementsBuffer_ - Variable in class org.cyberneko.html.HTMLTagBalancer
 
endGeneralEntity(String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
End entity.
endPrefixMapping(String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
endPrefixMapping(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
endPrefixMapping(String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
endPrefixMapping(String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
End prefix mapping.
equalLabels(Set<String>, Set<String>) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
 
equals(Object) - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if the objects are equal.
ERROR_REPORTER - Static variable in class org.cyberneko.html.HTMLTagBalancer
Error reporter.
Event() - Constructor for enum com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
 
ExpandTitleToContentFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Marks all TextBlocks "content" which are between the headline and the part that has already been marked content, if they are marked DefaultLabels.MIGHT_BE_CONTENT.
ExpandTitleToContentFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
 
expandToSameLevelText - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
ExtractorBase - Class in com.kohlschutter.boilerpipe.extractors
The base class of Extractors.
ExtractorBase() - Constructor for class com.kohlschutter.boilerpipe.extractors.ExtractorBase
 
extraStyleSheet - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 

F

fAugmentations - Variable in class org.cyberneko.html.HTMLTagBalancer
Include infoset augmentations.
fDocumentFragment - Variable in class org.cyberneko.html.HTMLTagBalancer
Document fragment balancing only.
fDocumentHandler - Variable in class org.cyberneko.html.HTMLTagBalancer
The document handler.
fDocumentSource - Variable in class org.cyberneko.html.HTMLTagBalancer
The document source.
fElementStack - Variable in class org.cyberneko.html.HTMLTagBalancer
The element stack.
fEmptyAttrs - Variable in class org.cyberneko.html.HTMLTagBalancer
Empty attributes.
fErrorReporter - Variable in class org.cyberneko.html.HTMLTagBalancer
Error reporter.
fetch(URL) - Static method in class com.kohlschutter.boilerpipe.sax.HTMLFetcher
Fetches the document at the given URL, using URLConnection.
FIELDSET - Static variable in class org.cyberneko.html.HTMLElements
 
fIgnoreOutsideContent - Variable in class org.cyberneko.html.HTMLTagBalancer
Ignore outside content.
filter - Variable in class com.kohlschutter.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
 
fInfosetAugs - Variable in class org.cyberneko.html.HTMLTagBalancer
Augmentations.
fInlineStack - Variable in class org.cyberneko.html.HTMLTagBalancer
The inline stack.
flags - Variable in class org.cyberneko.html.HTMLElements.Element
Informational flags.
flush - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
flushBlock() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
fNamesAttrs - Variable in class org.cyberneko.html.HTMLTagBalancer
Modify HTML attribute names.
fNamesElems - Variable in class org.cyberneko.html.HTMLTagBalancer
Modify HTML element names.
fNamespaces - Variable in class org.cyberneko.html.HTMLTagBalancer
Namespaces.
FONT - Static variable in class org.cyberneko.html.HTMLElements
 
fontSizeStack - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
fOpenedForm - Variable in class org.cyberneko.html.HTMLTagBalancer
True if a form is in the stack (allow to discard opening of nested forms)
forcedEndElement_ - Variable in class org.cyberneko.html.HTMLTagBalancer
 
forcedStartElement_ - Variable in class org.cyberneko.html.HTMLTagBalancer
 
forceStartBody() - Method in class org.cyberneko.html.HTMLTagBalancer
Generates a missing (which creates missing when needed)
forceStartElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Forces an element start, taking care to set the information to allow startElement to "see" that's the element has been forced.
FORM - Static variable in class org.cyberneko.html.HTMLElements
 
fQName - Variable in class org.cyberneko.html.HTMLTagBalancer
A qualified name.
FRAGMENT_CONTEXT_STACK - Static variable in class org.cyberneko.html.HTMLTagBalancer
EXPERIMENTAL: may change in next release
Name of the property holding the stack of elements in which context a document fragment should be parsed.
fragmentContextStack_ - Variable in class org.cyberneko.html.HTMLTagBalancer
Stack of elements determining the context in which a document fragment should be parsed
fragmentContextStackSize_ - Variable in class org.cyberneko.html.HTMLTagBalancer
 
FRAME - Static variable in class org.cyberneko.html.HTMLElements
 
FRAMESET - Static variable in class org.cyberneko.html.HTMLElements
 
fReportErrors - Variable in class org.cyberneko.html.HTMLTagBalancer
Report errors.
fSeenAnything - Variable in class org.cyberneko.html.HTMLTagBalancer
True if seen anything.
fSeenBodyElement - Variable in class org.cyberneko.html.HTMLTagBalancer
True if seen <body< element.
fSeenDoctype - Variable in class org.cyberneko.html.HTMLTagBalancer
True if root element has been seen.
fSeenHeadElement - Variable in class org.cyberneko.html.HTMLTagBalancer
True if seen <head< element.
fSeenRootElement - Variable in class org.cyberneko.html.HTMLTagBalancer
True if root element has been seen.
fSeenRootElementEnd - Variable in class org.cyberneko.html.HTMLTagBalancer
True if seen the end of the document element.

G

getAlt() - Method in class com.kohlschutter.boilerpipe.document.Image
 
getAncestorLabels() - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
getArea() - Method in class com.kohlschutter.boilerpipe.document.Image
Returns the image's area (specified by width * height), or -1 if width/height weren't both specified or could not be parsed.
getCharset() - Method in class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
getContainedTextElements() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
Returns the containedTextElements BitSet, or null.
getContent() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
Returns the TextDocument's content.
getData() - Method in class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
getDefaultInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
Returns the singleton instance for DeleteBlocksAfterContentFilter.
getDefaultInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
 
getDocumentHandler() - Method in class org.cyberneko.html.HTMLTagBalancer
Returns the document handler.
getDocumentSource() - Method in class org.cyberneko.html.HTMLTagBalancer
Returns the document source.
getElement(short) - Static method in class org.cyberneko.html.HTMLElements
Returns the element information for the specified element code.
getElement(String) - Static method in class org.cyberneko.html.HTMLElements
Returns the element information for the specified element name.
getElement(String, HTMLElements.Element) - Static method in class org.cyberneko.html.HTMLElements
Returns the element information for the specified element name.
getElement(QName) - Method in class org.cyberneko.html.HTMLTagBalancer
Returns an HTML element.
getElementDepth(HTMLElements.Element) - Method in class org.cyberneko.html.HTMLTagBalancer
Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.
getExtraStyleSheet() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Returns the extra stylesheet definition that will be inserted in the HEAD element.
getFeatureDefault(String) - Method in class org.cyberneko.html.HTMLTagBalancer
Returns the default state for a feature.
getHeight() - Method in class com.kohlschutter.boilerpipe.document.Image
 
getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
Returns the singleton instance for ArticleExtractor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
Returns the singleton instance for ArticleSentencesExtractor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
Returns the singleton instance for CanolaExtractor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
Returns the singleton instance for DefaultExtractor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
Returns the singleton instance for LargestContentExtractor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
Returns the singleton instance for NumWordsRulesExtractor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
Returns the default instance for PrintDebugFilter, which dumps debug information to System.out
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
Returns the singleton instance for RulebasedBoilerpipeClassifier.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
Returns the singleton instance for RulebasedBoilerpipeClassifier.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
Returns the singleton instance for TerminatingBlocksFinder.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
Returns the singleton instance for ExpandTitleToContentFilter.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
Returns the singleton instance for BlockFusionProcessor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
Returns the singleton instance for ExpandTitleToContentFilter.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
Returns the singleton instance for BoilerplateBlockFilter.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
Returns the singleton instance for TerminatingBlocksFinder.
getInstance() - Static method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
Returns the singleton instance of ImageExtractor.
getLabels() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
Returns the labels associated to this TextBlock, or null if no such labels exist.
getLinkDensity() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getLongestPart(String, String) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
getNamesValue(String) - Static method in class org.cyberneko.html.HTMLTagBalancer
Converts HTML names string value to constant value.
getNumFullTextWords(TextBlock) - Static method in class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
 
getNumFullTextWords(TextBlock, float) - Static method in class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
 
getNumWords() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getNumWords() - Method in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
Returns the overall number of words in all blocks.
getNumWordsInAnchorText() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getOffsetBlocksEnd() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getOffsetBlocksStart() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getParentDepth(HTMLElements.Element[], short) - Method in class org.cyberneko.html.HTMLTagBalancer
Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.
getPostHighlight() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Returns the string that will be inserted after any highlighted HTML block.
getPotentialTitles() - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
getPreHighlight() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Returns the string that will be inserted before any highlighted HTML block.
getPropertyDefault(String) - Method in class org.cyberneko.html.HTMLTagBalancer
Returns the default state for a property.
getRecognizedFeatures() - Method in class org.cyberneko.html.HTMLTagBalancer
Returns recognized features.
getRecognizedProperties() - Method in class org.cyberneko.html.HTMLTagBalancer
Returns recognized properties.
getSrc() - Method in class com.kohlschutter.boilerpipe.document.Image
 
getTagLevel() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getTagWhitelist() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
getText() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getText(boolean, boolean) - Method in class com.kohlschutter.boilerpipe.document.TextDocument
Returns the TextDocument's content, non-content or both
getText(TextDocument) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
Extracts text from the given TextDocument object.
getText(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
Extracts text from the given TextDocument object.
getText(Reader) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
Extracts text from the HTML code available from the given Reader.
getText(Reader) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code available from the given Reader.
getText(String) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
Extracts text from the HTML code given as a String.
getText(String) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code given as a String.
getText(URL) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code available from the given URL.
getText(InputSource) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
Extracts text from the HTML code available from the given InputSource.
getText(InputSource) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code available from the given InputSource.
getTextBlocks() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
Returns the TextBlocks of this document.
getTextBlocks() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
getTextDensity() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getTextDocument() - Method in interface com.kohlschutter.boilerpipe.BoilerpipeInput
Returns (somehow) a TextDocument.
getTextDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
Retrieves the TextDocument using a default HTML parser.
getTextDocument(BoilerpipeHTMLParser) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
Retrieves the TextDocument using the given HTML parser.
getTitle() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
Returns the "main" title for this document, or null if no such title has ben set.
getTitle() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
getWidth() - Method in class com.kohlschutter.boilerpipe.document.Image
 

H

H1 - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
H1 - Static variable in class org.cyberneko.html.HTMLElements
 
H2 - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
H2 - Static variable in class org.cyberneko.html.HTMLElements
 
H3 - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
H3 - Static variable in class org.cyberneko.html.HTMLElements
 
H4 - Static variable in class org.cyberneko.html.HTMLElements
 
H5 - Static variable in class org.cyberneko.html.HTMLElements
 
H6 - Static variable in class org.cyberneko.html.HTMLElements
 
hashCode() - Method in class org.cyberneko.html.HTMLElements.Element
Returns a hash code for this object.
hasLabel(String) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
Checks whether this TextBlock has the given label.
HEAD - Static variable in class org.cyberneko.html.HTMLElements
 
HEADING - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
height - Variable in class com.kohlschutter.boilerpipe.document.Image
 
HeuristicFilterBase - Class in com.kohlschutter.boilerpipe.filters.english
Base class for some heuristics that are used by boilerpipe filters.
HeuristicFilterBase() - Constructor for class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
 
hl - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
HR - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
HR - Static variable in class org.cyberneko.html.HTMLElements
 
html - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
HTML - Static variable in class org.cyberneko.html.HTMLElements
 
HTMLDocument - Class in com.kohlschutter.boilerpipe.sax
HTMLDocument(byte[], Charset) - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
HTMLDocument(String) - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
HTMLElements - Class in org.cyberneko.html
Collection of HTML element information.
HTMLElements() - Constructor for class org.cyberneko.html.HTMLElements
 
HTMLElements.Element - Class in org.cyberneko.html
Element information.
HTMLElements.ElementList - Class in org.cyberneko.html
Unsynchronized list of elements.
HTMLFetcher - Class in com.kohlschutter.boilerpipe.sax
A very simple HTTP/HTML fetcher, really just for demo purposes.
HTMLFetcher() - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLFetcher
 
HTMLHighlightDemo - Class in com.kohlschutter.boilerpipe.demo
Demonstrates how to use Boilerpipe to get the main content, highlighted as HTML.
HTMLHighlightDemo() - Constructor for class com.kohlschutter.boilerpipe.demo.HTMLHighlightDemo
 
HTMLHighlighter - Class in com.kohlschutter.boilerpipe.sax
Highlights text blocks in an HTML document that have been marked as "content" in the corresponding TextDocument.
HTMLHighlighter(boolean) - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
HTMLHighlighter.Implementation - Class in com.kohlschutter.boilerpipe.sax
 
HTMLHighlighter.TagAction - Class in com.kohlschutter.boilerpipe.sax
 
HTMLTagBalancer - Class in org.cyberneko.html
 
HTMLTagBalancer() - Constructor for class org.cyberneko.html.HTMLTagBalancer
 
HTMLTagBalancer.ElementEntry - Class in org.cyberneko.html
Structure to hold information about an element placed in buffer to be comsumed later
HTMLTagBalancer.Info - Class in org.cyberneko.html
Element info for each start element.
HTMLTagBalancer.InfoStack - Class in org.cyberneko.html
Unsynchronized stack of element information.

I

I - Static variable in class org.cyberneko.html.HTMLElements
 
IFRAME - Static variable in class org.cyberneko.html.HTMLElements
 
ignorableWhitespace(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
ignorableWhitespace(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
ignorableWhitespace(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
ignorableWhitespace(XMLString, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Ignorable whitespace.
IGNORE_OUTSIDE_CONTENT - Static variable in class org.cyberneko.html.HTMLTagBalancer
Ignore outside content.
IgnoreBlocksAfterContentFilter - Class in com.kohlschutter.boilerpipe.filters.english
Marks all blocks as "non-content" that occur after blocks that have been marked DefaultLabels.INDICATES_END_OF_TEXT.
IgnoreBlocksAfterContentFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
IgnoreBlocksAfterContentFromEndFilter - Class in com.kohlschutter.boilerpipe.filters.english
Marks all blocks as "non-content" that occur after blocks that have been marked DefaultLabels.INDICATES_END_OF_TEXT, and after any content block.
IgnoreBlocksAfterContentFromEndFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
 
ILAYER - Static variable in class org.cyberneko.html.HTMLElements
 
Image - Class in com.kohlschutter.boilerpipe.document
Represents an Image resource that is contained in the document.
Image(String, String, String, String) - Constructor for class com.kohlschutter.boilerpipe.document.Image
 
ImageExtractor - Class in com.kohlschutter.boilerpipe.sax
Extracts the images that are enclosed by extracted content.
ImageExtractor() - Constructor for class com.kohlschutter.boilerpipe.sax.ImageExtractor
 
ImageExtractor.Implementation - Class in com.kohlschutter.boilerpipe.sax
 
ImageExtractor.TagAction - Class in com.kohlschutter.boilerpipe.sax
 
ImageExtractorDemo - Class in com.kohlschutter.boilerpipe.demo
Demonstrates how to use Boilerpipe to get the images within the main content.
ImageExtractorDemo() - Constructor for class com.kohlschutter.boilerpipe.demo.ImageExtractorDemo
 
IMG - Static variable in class org.cyberneko.html.HTMLElements
 
Implementation() - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
Implementation() - Constructor for class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
inAnchor - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
inAnchorText - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
inBody - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
INDICATES_END_OF_TEXT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
Info(HTMLElements.Element, QName) - Constructor for class org.cyberneko.html.HTMLTagBalancer.Info
Creates an element information object.
Info(HTMLElements.Element, QName, XMLAttributes) - Constructor for class org.cyberneko.html.HTMLTagBalancer.Info
Creates an element information object.
InfoStack() - Constructor for class org.cyberneko.html.HTMLTagBalancer.InfoStack
 
inHighlight - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
inIgnorableElement - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
inIgnorableElement - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
inIgnorableElement - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
initDensities() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
INLINE - Static variable in class org.cyberneko.html.HTMLElements.Element
Inline element.
InlineTagLabelAction(LabelAction) - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
INPUT - Static variable in class org.cyberneko.html.HTMLElements
 
InputSourceable - Interface in com.kohlschutter.boilerpipe.sax
An InputSourceable can return an arbitrary number of new InputSources for a given document.
INS - Static variable in class org.cyberneko.html.HTMLElements
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.estimators.SimpleEstimator
Returns the singleton instance of SimpleEstimator
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
Returns the default instance for PrintDebugFilter, which dumps debug information to System.out
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ContentFusion
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ListAtEndFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.InvertedFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingContentFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.sax.DefaultTagActionMap
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor
 
INSTANCE_200 - Static variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
INSTANCE_EXPAND_TO_SAME_TAGLEVEL - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
INSTANCE_EXPAND_TO_SAME_TAGLEVEL_MIN_WORDS - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
INSTANCE_KEEP_TITLE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
 
INSTANCE_PRE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
 
INSTANCE_STRICTLY_NOT_CONTENT - Static variable in class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
 
INSTANCE_TEXT - Static variable in class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
 
InvertedFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Reverts the "isContent" flag for all TextBlocks
InvertedFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.InvertedFilter
 
is - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
 
isBlock() - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if this element is a block element.
isBlockLevel - Variable in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
isClause(CharSequence) - Method in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
isContainer() - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if this element is a container element.
isContent - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
isContent() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
isDigit(char) - Static method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
 
isEmpty() - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if this element is an empty element.
ISINDEX - Static variable in class org.cyberneko.html.HTMLElements
 
isInline() - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if this element is an inline element.
isLowQuality(TextDocumentStatistics, TextDocumentStatistics) - Method in class com.kohlschutter.boilerpipe.estimators.SimpleEstimator
Given the statistics of the document before and after applying the BoilerpipeExtractor, can we regard the extraction quality (too) low? Works well with DefaultExtractor, ArticleExtractor and others.
isOutputHighlightOnly() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
If true, only HTML enclosed within highlighted content will be returned
isParent(HTMLElements.Element) - Method in class org.cyberneko.html.HTMLElements.Element
Indicates if the provided element is an accepted parent of current element
isSpecial() - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if this element is special -- if its content should be parsed ignoring markup.
isWord(String) - Static method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 

K

KBD - Static variable in class org.cyberneko.html.HTMLElements
 
KEEP_EVERYTHING_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
Dummy Extractor; should return the input text.
KeepEverythingExtractor - Class in com.kohlschutter.boilerpipe.extractors
Marks everything as content.
KeepEverythingExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor
 
KeepEverythingWithMinKWordsExtractor - Class in com.kohlschutter.boilerpipe.extractors
A full-text extractor which extracts the largest text component of a page.
KeepEverythingWithMinKWordsExtractor(int) - Constructor for class com.kohlschutter.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
 
KeepLargestBlockFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Keeps the largest TextBlock only (by the number of words).
KeepLargestBlockFilter(boolean, int) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
KeepLargestFulltextBlockFilter - Class in com.kohlschutter.boilerpipe.filters.english
Keeps the largest TextBlock only (by the number of words).
KeepLargestFulltextBlockFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
 
KEYGEN - Static variable in class org.cyberneko.html.HTMLElements
 

L

LABEL - Static variable in class org.cyberneko.html.HTMLElements
 
LabelAction - Class in com.kohlschutter.boilerpipe.labels
Helps adding labels to TextBlocks.
LabelAction(String...) - Constructor for class com.kohlschutter.boilerpipe.labels.LabelAction
 
LabelFusion - Class in com.kohlschutter.boilerpipe.filters.heuristics
Fuses adjacent blocks if their labels are equal.
LabelFusion() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
Creates a new LabelFusion instance.
labelPrefix - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
 
labels - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
labels - Variable in class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
 
labels - Variable in class com.kohlschutter.boilerpipe.filters.simple.LabelToContentFilter
 
labels - Variable in class com.kohlschutter.boilerpipe.labels.LabelAction
 
labelStack - Variable in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
labelStacks - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
LabelToBoilerplateFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Marks all blocks that contain a given label as "boilerplate".
LabelToBoilerplateFilter(String...) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
 
LabelToContentFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Marks all blocks that contain a given label as "content".
LabelToContentFilter(String...) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.LabelToContentFilter
 
labelToKeep - Variable in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
 
LargeBlockSameTagLevelToContentFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Marks all blocks as content that: are on the same tag-level as very likely main content (usually the level of the largest block) have a significant number of words, currently: at least 100
LargeBlockSameTagLevelToContentFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
 
LARGEST_CONTENT_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
Like DefaultExtractor, but keeps the largest text block only.
LargestContentExtractor - Class in com.kohlschutter.boilerpipe.extractors
A full-text extractor which extracts the largest text component of a page.
LargestContentExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
 
lastEndTag - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
lastEvent - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
lastStartTag - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
LAYER - Static variable in class org.cyberneko.html.HTMLElements
 
LEGEND - Static variable in class org.cyberneko.html.HTMLElements
 
LI - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
LI - Static variable in class org.cyberneko.html.HTMLElements
 
LINK - Static variable in class org.cyberneko.html.HTMLElements
 
linkDensity - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
linksBuffer - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
linksHighlight - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
ListAtEndFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Marks nested list-item blocks after the end of the main content.
ListAtEndFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ListAtEndFilter
 
LISTING - Static variable in class org.cyberneko.html.HTMLElements
 
lostText_ - Variable in class org.cyberneko.html.HTMLTagBalancer
 

M

main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.HTMLHighlightDemo
 
main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.ImageExtractorDemo
 
main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.Oneliner
 
main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.UsingSAX
 
MAP - Static variable in class org.cyberneko.html.HTMLElements
 
MarkEverythingBoilerplateFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Marks all blocks as boilerplate.
MarkEverythingBoilerplateFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
 
MarkEverythingContentFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Marks all blocks as content.
MarkEverythingContentFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingContentFilter
 
MARKUP_PREFIX - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
markupLabelsOnly(Set<String>) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
 
MarkupTagAction - Class in com.kohlschutter.boilerpipe.sax
Assigns labels for element CSS classes and ids to the corresponding TextBlock.
MarkupTagAction(boolean) - Constructor for class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
MARQUEE - Static variable in class org.cyberneko.html.HTMLElements
 
MAX_DISTANCE_1 - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
MAX_DISTANCE_1_CONTENT_ONLY - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
MAX_DISTANCE_1_CONTENT_ONLY_SAME_TAGLEVEL - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
MAX_DISTANCE_1_SAME_TAGLEVEL - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
maxBlocksDistance - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
meetsCondition(TextBlock) - Method in interface com.kohlschutter.boilerpipe.conditions.TextBlockCondition
Returns true iff the given TextBlock tb meets the defined condition.
MENU - Static variable in class org.cyberneko.html.HTMLElements
 
mergeNext(TextBlock) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
META - Static variable in class org.cyberneko.html.HTMLElements
 
MIGHT_BE_CONTENT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
MinClauseWordsFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5).
MinClauseWordsFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
MinClauseWordsFilter(int, boolean) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
MinFulltextWordsFilter - Class in com.kohlschutter.boilerpipe.filters.english
Keeps only those content blocks which contain at least k full-text words (measured by HeuristicFilterBase.getNumFullTextWords(TextBlock)).
MinFulltextWordsFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
 
minNumWords - Variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
minWords - Variable in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
 
minWords - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
minWords - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
minWords - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinWordsFilter
 
MinWordsFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Keeps only those content blocks which contain at least k words.
MinWordsFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MinWordsFilter
 
modifyName(String, short) - Static method in class org.cyberneko.html.HTMLTagBalancer
Modifies the given name based on the specified mode.
MULTICOL - Static variable in class org.cyberneko.html.HTMLElements
 

N

name - Variable in class org.cyberneko.html.HTMLElements.Element
The element name.
name_ - Variable in class org.cyberneko.html.HTMLTagBalancer.ElementEntry
 
NAMES_ATTRS - Static variable in class org.cyberneko.html.HTMLTagBalancer
Modify HTML attribute names: { "upper", "lower", "default" }.
NAMES_ELEMS - Static variable in class org.cyberneko.html.HTMLTagBalancer
Modify HTML element names: { "upper", "lower", "default" }.
NAMES_LOWERCASE - Static variable in class org.cyberneko.html.HTMLTagBalancer
Lowercase HTML names.
NAMES_MATCH - Static variable in class org.cyberneko.html.HTMLTagBalancer
Match HTML element names.
NAMES_NO_CHANGE - Static variable in class org.cyberneko.html.HTMLTagBalancer
Don't modify HTML names.
NAMES_UPPERCASE - Static variable in class org.cyberneko.html.HTMLTagBalancer
Uppercase HTML names.
NAMESPACES - Static variable in class org.cyberneko.html.HTMLTagBalancer
Namespaces.
nestable - Variable in class org.cyberneko.html.HTMLElements.Element
If set to true, then this element may not be nested, example: "A"
newExtractingInstance() - Static method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Creates a new HTMLHighlighter, which is set-up to return only the extracted HTML text, including enclosed markup.
newHighlightingInstance() - Static method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Creates a new HTMLHighlighter, which is set-up to return the full HTML text, with the extracted text portion highlighted.
NEXTID - Static variable in class org.cyberneko.html.HTMLElements
 
NO_SUCH_ELEMENT - Static variable in class org.cyberneko.html.HTMLElements
No such element.
NOBR - Static variable in class org.cyberneko.html.HTMLElements
 
NOEMBED - Static variable in class org.cyberneko.html.HTMLElements
 
NOFRAMES - Static variable in class org.cyberneko.html.HTMLElements
 
NOLAYER - Static variable in class org.cyberneko.html.HTMLElements
 
NOSCRIPT - Static variable in class org.cyberneko.html.HTMLElements
 
notifyDiscardedEndElement(QName, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Notifies the tagBalancingListener (if any) of an ignored end element
notifyDiscardedStartElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Notifies the tagBalancingListener (if any) of an ignored start element
nullTrim(String) - Static method in class com.kohlschutter.boilerpipe.document.Image
 
numBlocks - Variable in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
 
numFullTextWords - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
numWords - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
numWords - Variable in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
 
numWordsInAnchorText - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
numWordsInWrappedLines - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
NumWordsRulesClassifier - Class in com.kohlschutter.boilerpipe.filters.english
Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block.
NumWordsRulesClassifier() - Constructor for class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
 
NumWordsRulesExtractor - Class in com.kohlschutter.boilerpipe.extractors
A quite generic full-text extractor solely based upon the number of words per block (the current, the previous and the next block).
NumWordsRulesExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
 
numWrappedLines - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 

O

OBJECT - Static variable in class org.cyberneko.html.HTMLElements
 
offsetBlocks - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
offsetBlocksEnd - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
offsetBlocksStart - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
OL - Static variable in class org.cyberneko.html.HTMLElements
 
Oneliner - Class in com.kohlschutter.boilerpipe.demo
Demonstrates how to use Boilerpipe to get the main content as plain text.
Oneliner() - Constructor for class com.kohlschutter.boilerpipe.demo.Oneliner
 
OPTGROUP - Static variable in class org.cyberneko.html.HTMLElements
 
OPTION - Static variable in class org.cyberneko.html.HTMLElements
 
org.cyberneko.html - package org.cyberneko.html
 
out - Variable in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
 
outputHighlightOnly - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 

P

P - Static variable in class org.cyberneko.html.HTMLElements
 
PARAM - Static variable in class org.cyberneko.html.HTMLElements
 
parent - Variable in class org.cyberneko.html.HTMLElements.Element
Parent elements.
parentCodes - Variable in class org.cyberneko.html.HTMLElements.Element
Parent elements.
PAT_CHARSET - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLFetcher
 
PAT_CLAUSE_DELIMITER - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
PAT_FONT_SIZE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
 
PAT_NOT_WORD_BOUNDARY - Static variable in class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
 
PAT_NUM - Static variable in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
PAT_REMOVE_CHARACTERS - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
PAT_SUPER_TAG - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
PAT_TAG_NO_TEXT - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
PAT_VALID_WORD_CHARACTER - Static variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
PAT_WHITESPACE - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
PAT_WORD_BOUNDARY - Static variable in class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
 
PATTERNS_SHORT - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
 
peek() - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
Peeks at the top of the stack.
PLAINTEXT - Static variable in class org.cyberneko.html.HTMLElements
 
pop() - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
Pops the top item off of the stack.
postHighlight - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
potentialTitles - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
PRE - Static variable in class org.cyberneko.html.HTMLElements
 
preHighlight - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
PrintDebugFilter - Class in com.kohlschutter.boilerpipe.filters.debug
Prints debug information about the current state of the TextDocument.
PrintDebugFilter(PrintWriter) - Constructor for class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
Creates a new instance of PrintDebugFilter.
process(TextDocument) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeFilter
Processes the given document doc.
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ContentFusion
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ListAtEndFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.InvertedFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.LabelToContentFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingContentFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MinWordsFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
 
process(TextDocument, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Processes the given TextDocument and the original HTML text (as a String).
process(TextDocument, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
Processes the given TextDocument and the original HTML text (as a String).
process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Processes the given TextDocument and the original HTML text (as an InputSource ).
process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
Processes the given TextDocument and the original HTML text (as an InputSource ).
process(URL, BoilerpipeExtractor) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Fetches the given URL using HTMLFetcher and processes the retrieved HTML using the specified BoilerpipeExtractor.
process(URL, BoilerpipeExtractor) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
Fetches the given URL using HTMLFetcher and processes the retrieved HTML using the specified BoilerpipeExtractor.
processingInstruction(String, String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
processingInstruction(String, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
processingInstruction(String, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
processingInstruction(String, XMLString, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Processing instruction.
push(HTMLTagBalancer.Info) - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
Pushes element information onto the stack.

Q

Q - Static variable in class org.cyberneko.html.HTMLElements
 
qname - Variable in class org.cyberneko.html.HTMLTagBalancer.Info
The element qualified name.

R

RB - Static variable in class org.cyberneko.html.HTMLElements
 
RBC - Static variable in class org.cyberneko.html.HTMLElements
 
RECOGNIZED_FEATURES - Static variable in class org.cyberneko.html.HTMLTagBalancer
Recognized features.
RECOGNIZED_FEATURES_DEFAULTS - Static variable in class org.cyberneko.html.HTMLTagBalancer
Recognized features defaults.
RECOGNIZED_PROPERTIES - Static variable in class org.cyberneko.html.HTMLTagBalancer
Recognized properties.
RECOGNIZED_PROPERTIES_DEFAULTS - Static variable in class org.cyberneko.html.HTMLTagBalancer
Recognized properties defaults.
recycle() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
Recycles this instance.
removeLabel(String) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
REPORT_ERRORS - Static variable in class org.cyberneko.html.HTMLTagBalancer
Report errors.
reset(XMLComponentManager) - Method in class org.cyberneko.html.HTMLTagBalancer
Resets the component.
RP - Static variable in class org.cyberneko.html.HTMLElements
 
RT - Static variable in class org.cyberneko.html.HTMLElements
 
RTC - Static variable in class org.cyberneko.html.HTMLElements
 
RUBY - Static variable in class org.cyberneko.html.HTMLElements
 

S

S - Static variable in class org.cyberneko.html.HTMLElements
 
sameTagLevelOnly - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
SAMP - Static variable in class org.cyberneko.html.HTMLElements
 
sbLastWasWhitespace - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
SCRIPT - Static variable in class org.cyberneko.html.HTMLElements
 
SELECT - Static variable in class org.cyberneko.html.HTMLElements
 
serialVersionUID - Static variable in exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
 
serialVersionUID - Static variable in class com.kohlschutter.boilerpipe.sax.DefaultTagActionMap
 
serialVersionUID - Static variable in class com.kohlschutter.boilerpipe.sax.TagActionMap
 
setContentHandler(BoilerpipeHTMLContentHandler) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
 
setContentHandler(ContentHandler) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
 
setDocumentHandler(XMLDocumentHandler) - Method in class org.cyberneko.html.HTMLTagBalancer
Sets the document handler.
setDocumentLocator(Locator) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
setDocumentLocator(Locator) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
setDocumentLocator(Locator) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
setDocumentSource(XMLDocumentSource) - Method in class org.cyberneko.html.HTMLTagBalancer
Sets the document source.
setExtraStyleSheet(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Sets the extra stylesheet definition that will be inserted in the HEAD element.
setFeature(String, boolean) - Method in class org.cyberneko.html.HTMLTagBalancer
Sets a feature.
setIsContent(boolean) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
setOutputHighlightOnly(boolean) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML document.
setPostHighlight(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Sets the string that will be inserted after any highlighted HTML block.
setPreHighlight(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Sets the string that will be inserted prior to any highlighted HTML block.
setProperty(String, Object) - Method in class org.cyberneko.html.HTMLTagBalancer
Sets a property.
setTagAction(String, TagAction) - Method in class com.kohlschutter.boilerpipe.sax.TagActionMap
Sets a particular TagAction for a given tag.
setTagBalancingListener(HTMLTagBalancingListener) - Method in class org.cyberneko.html.HTMLTagBalancer
 
setTagLevel(int) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
setTagWhitelist(Map<String, Set<String>>) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
setTitle(String) - Method in class com.kohlschutter.boilerpipe.document.TextDocument
Updates the "main" title for this document.
setTitle(String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
SimpleBlockFusionProcessor - Class in com.kohlschutter.boilerpipe.filters.heuristics
Merges two subsequent blocks if their text densities are equal.
SimpleBlockFusionProcessor() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
 
SimpleEstimator - Class in com.kohlschutter.boilerpipe.estimators
Estimates the "goodness" of a BoilerpipeExtractor on a given document.
SimpleEstimator() - Constructor for class com.kohlschutter.boilerpipe.estimators.SimpleEstimator
 
size - Variable in class org.cyberneko.html.HTMLElements.ElementList
The size of the list.
skippedEntity(String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
skippedEntity(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
skippedEntity(String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
SMALL - Static variable in class org.cyberneko.html.HTMLElements
 
SOUND - Static variable in class org.cyberneko.html.HTMLElements
 
SPACER - Static variable in class org.cyberneko.html.HTMLElements
 
SPAN - Static variable in class org.cyberneko.html.HTMLElements
 
SPECIAL - Static variable in class org.cyberneko.html.HTMLElements.Element
Special element.
SplitParagraphBlocksFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Splits TextBlocks at paragraph boundaries.
SplitParagraphBlocksFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
 
src - Variable in class com.kohlschutter.boilerpipe.document.Image
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in interface com.kohlschutter.boilerpipe.sax.TagAction
 
START_TAG - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
 
startCDATA(Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Start CDATA section.
startDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
startDocument() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
startDocument() - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
startDocument(XMLLocator, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Start document.
startDocument(XMLLocator, String, NamespaceContext, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Start document.
startElement(String, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
startElement(String, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
startElement(String, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
startElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Start element.
startGeneralEntity(String, XMLResourceIdentifier, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Start entity.
startPrefixMapping(String, String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
startPrefixMapping(String, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
startPrefixMapping(String, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
startPrefixMapping(String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Start prefix mapping.
startsWithNumber(String, int, String...) - Static method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
Checks whether the given text t starts with a sequence of digits, followed by one of the given strings.
STRICTLY_NOT_CONTENT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
STRIKE - Static variable in class org.cyberneko.html.HTMLElements
 
STRONG - Static variable in class org.cyberneko.html.HTMLElements
 
STYLE - Static variable in class org.cyberneko.html.HTMLElements
 
SUB - Static variable in class org.cyberneko.html.HTMLElements
 
SUP - Static variable in class org.cyberneko.html.HTMLElements
 
SurroundingToContentFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Marks blocks as "content" if their preceding and following blocks are both already marked "content", and the given TextBlockCondition is met.
SurroundingToContentFilter(TextBlockCondition) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
 
SYNTHESIZED_ITEM - Static variable in class org.cyberneko.html.HTMLTagBalancer
Synthesized event info item.
synthesizedAugs() - Method in class org.cyberneko.html.HTMLTagBalancer
Returns an augmentations object with a synthesized item added.

T

t1 - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
 
t2 - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
 
TA_ANCHOR_TEXT - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Marks this tag as "anchor" (this should usually only be set for the <A> tag).
TA_BLOCK_LEVEL - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Explicitly marks this tag a simple "block-level" element, which always generates whitespace
TA_BODY - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Marks this tag the body element (this should usually only be set for the <BODY> tag).
TA_FONT - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Special TagAction for the <FONT> tag, which keeps track of the absolute and relative font size.
TA_HEAD - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
TA_IGNORABLE_ELEMENT - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Marks this tag as "ignorable", i.e.
TA_IGNORABLE_ELEMENT - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
TA_IGNORABLE_ELEMENT - Static variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor
 
TA_INLINE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Deprecated.
TA_INLINE_NO_WHITESPACE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Marks this tag a simple "inline" element, which neither generates whitespace, nor a new block.
TA_INLINE_WHITESPACE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Marks this tag a simple "inline" element, which generates whitespace, but no new block.
TABLE - Static variable in class org.cyberneko.html.HTMLElements
 
TAG_ACTIONS - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
TAG_ACTIONS - Static variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor
 
TagAction - Interface in com.kohlschutter.boilerpipe.sax
Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.
TagAction() - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
 
TagAction() - Constructor for class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
 
TagActionMap - Class in com.kohlschutter.boilerpipe.sax
Base class for definition a set of TagActions that are to be used for the HTML parsing process.
TagActionMap() - Constructor for class com.kohlschutter.boilerpipe.sax.TagActionMap
 
tagActions - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
tagBalancingListener - Variable in class org.cyberneko.html.HTMLTagBalancer
 
tagLevel - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
tagLevel - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
tagWhitelist - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
TBODY - Static variable in class org.cyberneko.html.HTMLElements
 
TD - Static variable in class org.cyberneko.html.HTMLElements
 
TerminatingBlocksFinder - Class in com.kohlschutter.boilerpipe.filters.english
Finds blocks which are potentially indicating the end of an article text and marks them with DefaultLabels.INDICATES_END_OF_TEXT.
TerminatingBlocksFinder() - Constructor for class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
 
text - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
TEXTAREA - Static variable in class org.cyberneko.html.HTMLElements
 
TextBlock - Class in com.kohlschutter.boilerpipe.document
Describes a block of text.
TextBlock(String) - Constructor for class com.kohlschutter.boilerpipe.document.TextBlock
 
TextBlock(String, BitSet, int, int, int, int, int) - Constructor for class com.kohlschutter.boilerpipe.document.TextBlock
 
TextBlockCondition - Interface in com.kohlschutter.boilerpipe.conditions
Evaluates whether a given TextBlock meets a certain condition.
textBlocks - Variable in class com.kohlschutter.boilerpipe.document.TextDocument
 
textBlocks - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
textBuffer - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
textDecl(String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Text declaration.
textDensity - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
TextDocument - Class in com.kohlschutter.boilerpipe.document
A text document, consisting of one or more TextBlocks.
TextDocument(String, List<TextBlock>) - Constructor for class com.kohlschutter.boilerpipe.document.TextDocument
Creates a new TextDocument with given TextBlocks and given title.
TextDocument(List<TextBlock>) - Constructor for class com.kohlschutter.boilerpipe.document.TextDocument
Creates a new TextDocument with given TextBlocks, and no title.
TextDocumentStatistics - Class in com.kohlschutter.boilerpipe.document
Provides shallow statistics on a given TextDocument
TextDocumentStatistics(TextDocument, boolean) - Constructor for class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
Computes statistics on a given TextDocument.
textElementIdx - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
TFOOT - Static variable in class org.cyberneko.html.HTMLElements
 
TH - Static variable in class org.cyberneko.html.HTMLElements
 
THEAD - Static variable in class org.cyberneko.html.HTMLElements
 
title - Variable in class com.kohlschutter.boilerpipe.document.TextDocument
 
title - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
TITLE - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
TITLE - Static variable in class org.cyberneko.html.HTMLElements
 
toInputSource() - Method in class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
toInputSource() - Method in interface com.kohlschutter.boilerpipe.sax.InputSourceable
 
tokenBuffer - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
tokenize(CharSequence) - Static method in class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
Tokenizes the text and returns an array of tokens.
top - Variable in class org.cyberneko.html.HTMLTagBalancer.InfoStack
The top of the stack.
toString() - Method in class com.kohlschutter.boilerpipe.document.Image
 
toString() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
toString() - Method in class com.kohlschutter.boilerpipe.labels.LabelAction
 
toString() - Method in class org.cyberneko.html.HTMLElements.Element
Provides a simple representation to make debugging easier
toString() - Method in class org.cyberneko.html.HTMLTagBalancer.Info
Simple representation to make debugging easier
toString() - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
Simple representation to make debugging easier
toTextDocument() - Method in interface com.kohlschutter.boilerpipe.BoilerpipeDocumentSource
 
toTextDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
Returns a TextDocument containing the extracted TextBlock s.
toTextDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
Returns a TextDocument containing the extracted TextBlock s.
TR - Static variable in class org.cyberneko.html.HTMLElements
 
TrailingHeadlineToBoilerplateFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Marks trailing headlines (TextBlocks that have the label DefaultLabels.HEADING) as boilerplate.
TrailingHeadlineToBoilerplateFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
 
TT - Static variable in class org.cyberneko.html.HTMLElements
 

U

U - Static variable in class org.cyberneko.html.HTMLElements
 
UL - Static variable in class org.cyberneko.html.HTMLElements
 
UnicodeTokenizer - Class in com.kohlschutter.boilerpipe.util
Tokenizes text according to Unicode word boundaries and strips off non-word characters.
UnicodeTokenizer() - Constructor for class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
 
UNKNOWN - Static variable in class org.cyberneko.html.HTMLElements
 
UsingSAX - Class in com.kohlschutter.boilerpipe.demo
Demonstrates how to use Boilerpipe when working with InputSources.
UsingSAX() - Constructor for class com.kohlschutter.boilerpipe.demo.UsingSAX
 

V

valueOf(String) - Static method in enum com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
Returns the enum constant of this type with the specified name.
values() - Static method in enum com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
Returns an array containing the constants of this enum type, in the order they are declared.
VAR - Static variable in class org.cyberneko.html.HTMLElements
 
VERY_LIKELY_CONTENT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 

W

WBR - Static variable in class org.cyberneko.html.HTMLElements
 
WHITESPACE - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
 
width - Variable in class com.kohlschutter.boilerpipe.document.Image
 

X

XML - Static variable in class org.cyberneko.html.HTMLElements
 
xmlDecl(String, String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
XML declaration.
xmlEncode(String) - Static method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
XMP - Static variable in class org.cyberneko.html.HTMLElements
 
A B C D E F G H I K L M N O P Q R S T U V W X 
All Classes All Packages