A B C D E F G H I K L M N O P Q R S T U V W X
All Classes All Packages
All Classes All Packages
All Classes All Packages
A
- A - Static variable in class org.cyberneko.html.HTMLElements
- ABBR - Static variable in class org.cyberneko.html.HTMLElements
- acceptClausesWithoutDelimiter - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- ACRONYM - Static variable in class org.cyberneko.html.HTMLElements
- action - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
- action - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
- addElement(HTMLElements.Element) - Method in class org.cyberneko.html.HTMLElements.ElementList
-
Adds an element to list, resizing if necessary.
- addLabel(String) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
-
Adds an arbitrary String label to this
TextBlock. - addLabelAction(LabelAction) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- addLabels(String...) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
-
Adds a set of labels to this
TextBlock. - addLabels(Set<String>) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
-
Adds a set of labels to this
TextBlock. - addLabelsTo(TextBlock) - Method in class com.kohlschutter.boilerpipe.labels.LabelAction
- addPotentialTitles(Set<String>, String, String, int) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- AddPrecedingLabelsFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Adds the labels of the preceding block to the current block, optionally adding a prefix.
- AddPrecedingLabelsFilter(String) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
-
Creates a new
AddPrecedingLabelsFilterinstance. - ADDRESS - Static variable in class org.cyberneko.html.HTMLElements
- addTagAction(String, TagAction) - Method in class com.kohlschutter.boilerpipe.sax.TagActionMap
-
Adds a particular
TagActionfor a given tag. - addTextBlock(TextBlock) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- addTo(TextBlock) - Method in class com.kohlschutter.boilerpipe.labels.ConditionalLabelAction
- addTo(TextBlock) - Method in class com.kohlschutter.boilerpipe.labels.LabelAction
- addWhitespaceIfNecessary() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- afterEnd(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
- afterEnd(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
- afterStart(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
- afterStart(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
- alt - Variable in class com.kohlschutter.boilerpipe.document.Image
- ANCHOR_TEXT_END - Static variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- ANCHOR_TEXT_START - Static variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- APPLET - Static variable in class org.cyberneko.html.HTMLElements
- area - Variable in class com.kohlschutter.boilerpipe.document.Image
- AREA - Static variable in class org.cyberneko.html.HTMLElements
- ARTICLE_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
-
Works very well for most types of Article-like HTML.
- ARTICLE_METADATA - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- ArticleExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
A full-text extractor which is tuned towards news articles.
- ArticleExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
- ArticleMetadataFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Tries to find TextBlocks that comprise of "article metadata".
- ArticleMetadataFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
- ArticleSentencesExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
A full-text extractor which is tuned towards extracting sentences from news articles.
- ArticleSentencesExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
- attributes - Variable in class org.cyberneko.html.HTMLTagBalancer.Info
-
The element attributes.
- AUGMENTATIONS - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Include infoset augmentations.
- augs_ - Variable in class org.cyberneko.html.HTMLTagBalancer.ElementEntry
- avgNumWords() - Method in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
-
Returns the average number of words at block-level (= overall number of words divided by the number of blocks).
B
- B - Static variable in class org.cyberneko.html.HTMLElements
- BASE - Static variable in class org.cyberneko.html.HTMLElements
- BASEFONT - Static variable in class org.cyberneko.html.HTMLElements
- BDO - Static variable in class org.cyberneko.html.HTMLElements
- beforeEnd(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
- beforeEnd(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
- beforeStart(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
- beforeStart(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
- BGSOUND - Static variable in class org.cyberneko.html.HTMLElements
- BIG - Static variable in class org.cyberneko.html.HTMLElements
- BLINK - Static variable in class org.cyberneko.html.HTMLElements
- BLOCK - Static variable in class org.cyberneko.html.HTMLElements.Element
-
Block element.
- BlockProximityFusion - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.
- BlockProximityFusion(int, boolean, boolean) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
-
Creates a new
BlockProximityFusioninstance. - BLOCKQUOTE - Static variable in class org.cyberneko.html.HTMLElements
- BlockTagLabelAction(LabelAction) - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
- blockTagLevel - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- BODY - Static variable in class org.cyberneko.html.HTMLElements
- BoilerpipeDocumentSource - Interface in com.kohlschutter.boilerpipe
-
Something that can be represented as a
TextDocument. - BoilerpipeExtractor - Interface in com.kohlschutter.boilerpipe
-
Describes a complete filter pipeline.
- BoilerpipeFilter - Interface in com.kohlschutter.boilerpipe
-
A generic
BoilerpipeFilter. - BoilerpipeHTMLContentHandler - Class in com.kohlschutter.boilerpipe.sax
-
A simple SAX
ContentHandler, used byBoilerpipeSAXInput. - BoilerpipeHTMLContentHandler() - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
Constructs a
BoilerpipeHTMLContentHandlerusing theDefaultTagActionMap. - BoilerpipeHTMLContentHandler(TagActionMap) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
Constructs a
BoilerpipeHTMLContentHandlerusing the givenTagActionMap. - BoilerpipeHTMLContentHandler.Event - Enum in com.kohlschutter.boilerpipe.sax
- BoilerpipeHTMLParser - Class in com.kohlschutter.boilerpipe.sax
-
A simple SAX Parser, used by
BoilerpipeSAXInput. - BoilerpipeHTMLParser() - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
-
Constructs a
BoilerpipeHTMLParserusing a default HTML content handler. - BoilerpipeHTMLParser(boolean) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
- BoilerpipeHTMLParser(BoilerpipeHTMLContentHandler) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
-
Constructs a
BoilerpipeHTMLParserusing the givenBoilerpipeHTMLContentHandler. - BoilerpipeInput - Interface in com.kohlschutter.boilerpipe
-
A source that returns
TextDocuments. - BoilerpipeProcessingException - Exception in com.kohlschutter.boilerpipe
-
Exception for signaling failure in the processing pipeline.
- BoilerpipeProcessingException() - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
- BoilerpipeProcessingException(String) - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
- BoilerpipeProcessingException(String, Throwable) - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
- BoilerpipeProcessingException(Throwable) - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
- BoilerpipeSAXInput - Class in com.kohlschutter.boilerpipe.sax
-
Parses an
InputSourceusing SAX and returns aTextDocument. - BoilerpipeSAXInput(InputSource) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
-
Creates a new instance of
BoilerpipeSAXInputfor the givenInputSource. - BoilerplateBlockFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Removes
TextBlocks which have explicitly been marked as "not content". - BoilerplateBlockFilter(String) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
- bounds - Variable in class org.cyberneko.html.HTMLElements.Element
-
The bounding element code.
- BR - Static variable in class org.cyberneko.html.HTMLElements
- BUTTON - Static variable in class org.cyberneko.html.HTMLElements
C
- callEndElement(QName, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Call document handler end element.
- callStartElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Call document handler start element.
- CANOLA_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
-
Trained on krdwrd Canola (different definition of "boilerplate").
- CanolaExtractor - Class in com.kohlschutter.boilerpipe.extractors
- CanolaExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
- CAPTION - Static variable in class org.cyberneko.html.HTMLElements
- CENTER - Static variable in class org.cyberneko.html.HTMLElements
- Chained(TagAction, TagAction) - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
- changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
- changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
- changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
- changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- changesTagLevel() - Method in interface com.kohlschutter.boilerpipe.sax.TagAction
- characterElementIdx - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- characterElementIdx - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- characters(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- characters(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- characters(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- characters(XMLString, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Characters.
- CHARACTERS - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
- charset - Variable in class com.kohlschutter.boilerpipe.sax.HTMLDocument
- CITE - Static variable in class org.cyberneko.html.HTMLElements
- CLASSIFIER - Static variable in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
-
The actual classifier, exposed.
- classify(TextBlock, TextBlock, TextBlock) - Method in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
- classify(TextBlock, TextBlock, TextBlock) - Method in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
- clone() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- clone() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
- closes - Variable in class org.cyberneko.html.HTMLElements.Element
-
List of elements this element can close.
- closes(short) - Method in class org.cyberneko.html.HTMLElements.Element
-
Returns true if this element can close the specified Element.
- code - Variable in class org.cyberneko.html.HTMLElements.Element
-
The element code.
- CODE - Static variable in class org.cyberneko.html.HTMLElements
- COL - Static variable in class org.cyberneko.html.HTMLElements
- COLGROUP - Static variable in class org.cyberneko.html.HTMLElements
- com.kohlschutter.boilerpipe - package com.kohlschutter.boilerpipe
-
The Boilerpipe top-level package.
- com.kohlschutter.boilerpipe.conditions - package com.kohlschutter.boilerpipe.conditions
- com.kohlschutter.boilerpipe.demo - package com.kohlschutter.boilerpipe.demo
-
Just some simple demo code.
- com.kohlschutter.boilerpipe.document - package com.kohlschutter.boilerpipe.document
-
The Boilerpipe document model.
- com.kohlschutter.boilerpipe.estimators - package com.kohlschutter.boilerpipe.estimators
- com.kohlschutter.boilerpipe.extractors - package com.kohlschutter.boilerpipe.extractors
-
Some standard extractors (i.e., completely piped BoilerpipeFilters)
- com.kohlschutter.boilerpipe.filters.debug - package com.kohlschutter.boilerpipe.filters.debug
- com.kohlschutter.boilerpipe.filters.english - package com.kohlschutter.boilerpipe.filters.english
-
These BoilerpipeFilters have only been tested on English text.
- com.kohlschutter.boilerpipe.filters.heuristics - package com.kohlschutter.boilerpipe.filters.heuristics
-
These BoilerpipeFilters are pure heuristics.
- com.kohlschutter.boilerpipe.filters.simple - package com.kohlschutter.boilerpipe.filters.simple
-
These BoilerpipeFilters are straight-forward and probably not really specific to English.
- com.kohlschutter.boilerpipe.labels - package com.kohlschutter.boilerpipe.labels
- com.kohlschutter.boilerpipe.sax - package com.kohlschutter.boilerpipe.sax
-
Classes related to parsing and producing HTML from/to Boilerpipe TextDocuments.
- com.kohlschutter.boilerpipe.util - package com.kohlschutter.boilerpipe.util
-
Some helper classes.
- comment(XMLString, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Comment.
- COMMENT - Static variable in class org.cyberneko.html.HTMLElements
- CommonExtractors - Class in com.kohlschutter.boilerpipe.extractors
-
Provides quick access to common
BoilerpipeExtractors. - CommonExtractors() - Constructor for class com.kohlschutter.boilerpipe.extractors.CommonExtractors
- CommonTagActions - Class in com.kohlschutter.boilerpipe.sax
-
Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.
- CommonTagActions() - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions
- CommonTagActions.BlockTagLabelAction - Class in com.kohlschutter.boilerpipe.sax
-
CommonTagActionsfor block-level elements, which triggers someLabelActionon the generatedTextBlock. - CommonTagActions.Chained - Class in com.kohlschutter.boilerpipe.sax
- CommonTagActions.InlineTagLabelAction - Class in com.kohlschutter.boilerpipe.sax
- compareTo(Image) - Method in class com.kohlschutter.boilerpipe.document.Image
- cond - Variable in class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
- condition - Variable in class com.kohlschutter.boilerpipe.labels.ConditionalLabelAction
- ConditionalLabelAction - Class in com.kohlschutter.boilerpipe.labels
-
Adds labels to a
TextBlockif the given criteria are met. - ConditionalLabelAction(TextBlockCondition, String...) - Constructor for class com.kohlschutter.boilerpipe.labels.ConditionalLabelAction
- consumeBufferedEndElements() - Method in class org.cyberneko.html.HTMLTagBalancer
-
Consume elements that have been buffered, like that are first consumed at the end of document
- consumeEarlyTextIfNeeded() - Method in class org.cyberneko.html.HTMLTagBalancer
- containedTextElements - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- CONTAINER - Static variable in class org.cyberneko.html.HTMLElements.Element
-
Container element.
- contentBitSet - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- contentBitSet - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- ContentFusion - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Merges two blocks using some heuristics.
- ContentFusion() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ContentFusion
-
Creates a new
ContentFusioninstance. - contentHandler - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
- contentOnly - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- createQName(String) - Method in class org.cyberneko.html.HTMLTagBalancer
- currentContainedTextElements - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
D
- data - Variable in class com.kohlschutter.boilerpipe.sax.HTMLDocument
- data - Variable in class org.cyberneko.html.HTMLElements.ElementList
-
The data in the list.
- data - Variable in class org.cyberneko.html.HTMLTagBalancer.InfoStack
-
The stack data.
- DD - Static variable in class org.cyberneko.html.HTMLElements
- debugString() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
-
Returns detailed debugging information about the contained
TextBlocks. - DEFAULT_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
-
Usually worse than
ArticleExtractor, but simpler/no heuristics. - DEFAULT_INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
- DEFAULT_INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
- DefaultExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
A quite generic full-text extractor.
- DefaultExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
- DefaultLabels - Class in com.kohlschutter.boilerpipe.labels
-
Some pre-defined labels which can be used in conjunction with
TextBlock.addLabel(String)andTextBlock.hasLabel(String). - DefaultLabels() - Constructor for class com.kohlschutter.boilerpipe.labels.DefaultLabels
- DefaultTagActionMap - Class in com.kohlschutter.boilerpipe.sax
-
Default
TagActions. - DefaultTagActionMap() - Constructor for class com.kohlschutter.boilerpipe.sax.DefaultTagActionMap
- DEL - Static variable in class org.cyberneko.html.HTMLElements
- DensityRulesClassifier - Class in com.kohlschutter.boilerpipe.filters.english
-
Classifies
TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features", particularly using text densities and link densities. - DensityRulesClassifier() - Constructor for class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
- DFN - Static variable in class org.cyberneko.html.HTMLElements
- DIR - Static variable in class org.cyberneko.html.HTMLElements
- DIV - Static variable in class org.cyberneko.html.HTMLElements
- DL - Static variable in class org.cyberneko.html.HTMLElements
- doctypeDecl(String, String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Doctype declaration.
- DOCUMENT_FRAGMENT - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Document fragment balancing only.
- DOCUMENT_FRAGMENT_DEPRECATED - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Document fragment balancing only (deprecated).
- DocumentTitleMatchClassifier - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Marks
TextBlocks which contain parts of the HTML<TITLE>tag, using some heuristics which are quite specific to the news domain. - DocumentTitleMatchClassifier(String) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- DT - Static variable in class org.cyberneko.html.HTMLElements
E
- element - Variable in class org.cyberneko.html.HTMLTagBalancer.Info
-
The element.
- Element(short, String, int, short[], short[]) - Constructor for class org.cyberneko.html.HTMLElements.Element
-
Constructs an element object.
- Element(short, String, int, short[], short, short[]) - Constructor for class org.cyberneko.html.HTMLElements.Element
-
Constructs an element object.
- Element(short, String, int, short, short[]) - Constructor for class org.cyberneko.html.HTMLElements.Element
-
Constructs an element object.
- Element(short, String, int, short, short, short[]) - Constructor for class org.cyberneko.html.HTMLElements.Element
-
Constructs an element object.
- ElementEntry(QName, Augmentations) - Constructor for class org.cyberneko.html.HTMLTagBalancer.ElementEntry
- ElementList() - Constructor for class org.cyberneko.html.HTMLElements.ElementList
- ELEMENTS - Static variable in class org.cyberneko.html.HTMLElements
-
Element information as a contiguous list.
- ELEMENTS_ARRAY - Static variable in class org.cyberneko.html.HTMLElements
-
Element information organized by first letter.
- EM - Static variable in class org.cyberneko.html.HTMLElements
- EMBED - Static variable in class org.cyberneko.html.HTMLElements
- EMPTY - Static variable in class org.cyberneko.html.HTMLElements.Element
-
Empty element.
- EMPTY_BITSET - Static variable in class com.kohlschutter.boilerpipe.document.TextBlock
- EMPTY_END - Static variable in class com.kohlschutter.boilerpipe.document.TextBlock
- EMPTY_START - Static variable in class com.kohlschutter.boilerpipe.document.TextBlock
- emptyAttributes() - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns a set of empty attributes.
- emptyElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Empty element.
- end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
- end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
- end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
- end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- end(BoilerpipeHTMLContentHandler, String, String) - Method in interface com.kohlschutter.boilerpipe.sax.TagAction
- END_TAG - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
- endCDATA(Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
End CDATA section.
- endDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- endDocument() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- endDocument() - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- endDocument(Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
End document.
- endElement(String, String, String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- endElement(String, String, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- endElement(String, String, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- endElement(QName, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
End element.
- endElementsBuffer_ - Variable in class org.cyberneko.html.HTMLTagBalancer
- endGeneralEntity(String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
End entity.
- endPrefixMapping(String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- endPrefixMapping(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- endPrefixMapping(String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- endPrefixMapping(String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
End prefix mapping.
- equalLabels(Set<String>, Set<String>) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
- equals(Object) - Method in class org.cyberneko.html.HTMLElements.Element
-
Returns true if the objects are equal.
- ERROR_REPORTER - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Error reporter.
- Event() - Constructor for enum com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
- ExpandTitleToContentFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Marks all
TextBlocks "content" which are between the headline and the part that has already been marked content, if they are markedDefaultLabels.MIGHT_BE_CONTENT. - ExpandTitleToContentFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
- expandToSameLevelText - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- ExtractorBase - Class in com.kohlschutter.boilerpipe.extractors
-
The base class of Extractors.
- ExtractorBase() - Constructor for class com.kohlschutter.boilerpipe.extractors.ExtractorBase
- extraStyleSheet - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
F
- fAugmentations - Variable in class org.cyberneko.html.HTMLTagBalancer
-
Include infoset augmentations.
- fDocumentFragment - Variable in class org.cyberneko.html.HTMLTagBalancer
-
Document fragment balancing only.
- fDocumentHandler - Variable in class org.cyberneko.html.HTMLTagBalancer
-
The document handler.
- fDocumentSource - Variable in class org.cyberneko.html.HTMLTagBalancer
-
The document source.
- fElementStack - Variable in class org.cyberneko.html.HTMLTagBalancer
-
The element stack.
- fEmptyAttrs - Variable in class org.cyberneko.html.HTMLTagBalancer
-
Empty attributes.
- fErrorReporter - Variable in class org.cyberneko.html.HTMLTagBalancer
-
Error reporter.
- fetch(URL) - Static method in class com.kohlschutter.boilerpipe.sax.HTMLFetcher
-
Fetches the document at the given URL, using
URLConnection. - FIELDSET - Static variable in class org.cyberneko.html.HTMLElements
- fIgnoreOutsideContent - Variable in class org.cyberneko.html.HTMLTagBalancer
-
Ignore outside content.
- filter - Variable in class com.kohlschutter.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
- fInfosetAugs - Variable in class org.cyberneko.html.HTMLTagBalancer
-
Augmentations.
- fInlineStack - Variable in class org.cyberneko.html.HTMLTagBalancer
-
The inline stack.
- flags - Variable in class org.cyberneko.html.HTMLElements.Element
-
Informational flags.
- flush - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- flushBlock() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- fNamesAttrs - Variable in class org.cyberneko.html.HTMLTagBalancer
-
Modify HTML attribute names.
- fNamesElems - Variable in class org.cyberneko.html.HTMLTagBalancer
-
Modify HTML element names.
- fNamespaces - Variable in class org.cyberneko.html.HTMLTagBalancer
-
Namespaces.
- FONT - Static variable in class org.cyberneko.html.HTMLElements
- fontSizeStack - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- fOpenedForm - Variable in class org.cyberneko.html.HTMLTagBalancer
-
True if a form is in the stack (allow to discard opening of nested forms)
- forcedEndElement_ - Variable in class org.cyberneko.html.HTMLTagBalancer
- forcedStartElement_ - Variable in class org.cyberneko.html.HTMLTagBalancer
- forceStartBody() - Method in class org.cyberneko.html.HTMLTagBalancer
-
Generates a missing (which creates missing when needed)
- forceStartElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Forces an element start, taking care to set the information to allow startElement to "see" that's the element has been forced.
- FORM - Static variable in class org.cyberneko.html.HTMLElements
- fQName - Variable in class org.cyberneko.html.HTMLTagBalancer
-
A qualified name.
- FRAGMENT_CONTEXT_STACK - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
EXPERIMENTAL: may change in next release
Name of the property holding the stack of elements in which context a document fragment should be parsed. - fragmentContextStack_ - Variable in class org.cyberneko.html.HTMLTagBalancer
-
Stack of elements determining the context in which a document fragment should be parsed
- fragmentContextStackSize_ - Variable in class org.cyberneko.html.HTMLTagBalancer
- FRAME - Static variable in class org.cyberneko.html.HTMLElements
- FRAMESET - Static variable in class org.cyberneko.html.HTMLElements
- fReportErrors - Variable in class org.cyberneko.html.HTMLTagBalancer
-
Report errors.
- fSeenAnything - Variable in class org.cyberneko.html.HTMLTagBalancer
-
True if seen anything.
- fSeenBodyElement - Variable in class org.cyberneko.html.HTMLTagBalancer
-
True if seen <body< element.
- fSeenDoctype - Variable in class org.cyberneko.html.HTMLTagBalancer
-
True if root element has been seen.
- fSeenHeadElement - Variable in class org.cyberneko.html.HTMLTagBalancer
-
True if seen <head< element.
- fSeenRootElement - Variable in class org.cyberneko.html.HTMLTagBalancer
-
True if root element has been seen.
- fSeenRootElementEnd - Variable in class org.cyberneko.html.HTMLTagBalancer
-
True if seen the end of the document element.
G
- getAlt() - Method in class com.kohlschutter.boilerpipe.document.Image
- getAncestorLabels() - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- getArea() - Method in class com.kohlschutter.boilerpipe.document.Image
-
Returns the image's area (specified by width * height), or -1 if width/height weren't both specified or could not be parsed.
- getCharset() - Method in class com.kohlschutter.boilerpipe.sax.HTMLDocument
- getContainedTextElements() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
-
Returns the containedTextElements BitSet, or
null. - getContent() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
-
Returns the
TextDocument's content. - getData() - Method in class com.kohlschutter.boilerpipe.sax.HTMLDocument
- getDefaultInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
-
Returns the singleton instance for DeleteBlocksAfterContentFilter.
- getDefaultInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
- getDocumentHandler() - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns the document handler.
- getDocumentSource() - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns the document source.
- getElement(short) - Static method in class org.cyberneko.html.HTMLElements
-
Returns the element information for the specified element code.
- getElement(String) - Static method in class org.cyberneko.html.HTMLElements
-
Returns the element information for the specified element name.
- getElement(String, HTMLElements.Element) - Static method in class org.cyberneko.html.HTMLElements
-
Returns the element information for the specified element name.
- getElement(QName) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns an HTML element.
- getElementDepth(HTMLElements.Element) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.
- getExtraStyleSheet() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Returns the extra stylesheet definition that will be inserted in the HEAD element.
- getFeatureDefault(String) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns the default state for a feature.
- getHeight() - Method in class com.kohlschutter.boilerpipe.document.Image
- getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
-
Returns the singleton instance for
ArticleExtractor. - getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
-
Returns the singleton instance for
ArticleSentencesExtractor. - getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
-
Returns the singleton instance for
CanolaExtractor. - getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
-
Returns the singleton instance for
DefaultExtractor. - getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
-
Returns the singleton instance for
LargestContentExtractor. - getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
-
Returns the singleton instance for
NumWordsRulesExtractor. - getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
-
Returns the default instance for
PrintDebugFilter, which dumps debug information toSystem.out - getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
-
Returns the singleton instance for RulebasedBoilerpipeClassifier.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
-
Returns the singleton instance for RulebasedBoilerpipeClassifier.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
-
Returns the singleton instance for TerminatingBlocksFinder.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
-
Returns the singleton instance for ExpandTitleToContentFilter.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
-
Returns the singleton instance for BlockFusionProcessor.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
-
Returns the singleton instance for ExpandTitleToContentFilter.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
-
Returns the singleton instance for BoilerplateBlockFilter.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
-
Returns the singleton instance for TerminatingBlocksFinder.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
-
Returns the singleton instance of
ImageExtractor. - getLabels() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
-
Returns the labels associated to this TextBlock, or
nullif no such labels exist. - getLinkDensity() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getLongestPart(String, String) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- getNamesValue(String) - Static method in class org.cyberneko.html.HTMLTagBalancer
-
Converts HTML names string value to constant value.
- getNumFullTextWords(TextBlock) - Static method in class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
- getNumFullTextWords(TextBlock, float) - Static method in class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
- getNumWords() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getNumWords() - Method in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
-
Returns the overall number of words in all blocks.
- getNumWordsInAnchorText() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getOffsetBlocksEnd() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getOffsetBlocksStart() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getParentDepth(HTMLElements.Element[], short) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.
- getPostHighlight() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Returns the string that will be inserted after any highlighted HTML block.
- getPotentialTitles() - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- getPreHighlight() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Returns the string that will be inserted before any highlighted HTML block.
- getPropertyDefault(String) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns the default state for a property.
- getRecognizedFeatures() - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns recognized features.
- getRecognizedProperties() - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns recognized properties.
- getSrc() - Method in class com.kohlschutter.boilerpipe.document.Image
- getTagLevel() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getTagWhitelist() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- getText() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getText(boolean, boolean) - Method in class com.kohlschutter.boilerpipe.document.TextDocument
-
Returns the
TextDocument's content, non-content or both - getText(TextDocument) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
-
Extracts text from the given
TextDocumentobject. - getText(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
-
Extracts text from the given
TextDocumentobject. - getText(Reader) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
-
Extracts text from the HTML code available from the given
Reader. - getText(Reader) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code available from the given
Reader. - getText(String) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
-
Extracts text from the HTML code given as a String.
- getText(String) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code given as a String.
- getText(URL) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code available from the given
URL. - getText(InputSource) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
-
Extracts text from the HTML code available from the given
InputSource. - getText(InputSource) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code available from the given
InputSource. - getTextBlocks() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
-
Returns the
TextBlocks of this document. - getTextBlocks() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- getTextDensity() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getTextDocument() - Method in interface com.kohlschutter.boilerpipe.BoilerpipeInput
-
Returns (somehow) a
TextDocument. - getTextDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
-
Retrieves the
TextDocumentusing a default HTML parser. - getTextDocument(BoilerpipeHTMLParser) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
-
Retrieves the
TextDocumentusing the given HTML parser. - getTitle() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
-
Returns the "main" title for this document, or
nullif no such title has ben set. - getTitle() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- getWidth() - Method in class com.kohlschutter.boilerpipe.document.Image
H
- H1 - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- H1 - Static variable in class org.cyberneko.html.HTMLElements
- H2 - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- H2 - Static variable in class org.cyberneko.html.HTMLElements
- H3 - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- H3 - Static variable in class org.cyberneko.html.HTMLElements
- H4 - Static variable in class org.cyberneko.html.HTMLElements
- H5 - Static variable in class org.cyberneko.html.HTMLElements
- H6 - Static variable in class org.cyberneko.html.HTMLElements
- hashCode() - Method in class org.cyberneko.html.HTMLElements.Element
-
Returns a hash code for this object.
- hasLabel(String) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
-
Checks whether this TextBlock has the given label.
- HEAD - Static variable in class org.cyberneko.html.HTMLElements
- HEADING - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- height - Variable in class com.kohlschutter.boilerpipe.document.Image
- HeuristicFilterBase - Class in com.kohlschutter.boilerpipe.filters.english
-
Base class for some heuristics that are used by boilerpipe filters.
- HeuristicFilterBase() - Constructor for class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
- hl - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- HR - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- HR - Static variable in class org.cyberneko.html.HTMLElements
- html - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- HTML - Static variable in class org.cyberneko.html.HTMLElements
- HTMLDocument - Class in com.kohlschutter.boilerpipe.sax
-
An
InputSourceableforHTMLFetcher. - HTMLDocument(byte[], Charset) - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLDocument
- HTMLDocument(String) - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLDocument
- HTMLElements - Class in org.cyberneko.html
-
Collection of HTML element information.
- HTMLElements() - Constructor for class org.cyberneko.html.HTMLElements
- HTMLElements.Element - Class in org.cyberneko.html
-
Element information.
- HTMLElements.ElementList - Class in org.cyberneko.html
-
Unsynchronized list of elements.
- HTMLFetcher - Class in com.kohlschutter.boilerpipe.sax
-
A very simple HTTP/HTML fetcher, really just for demo purposes.
- HTMLFetcher() - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLFetcher
- HTMLHighlightDemo - Class in com.kohlschutter.boilerpipe.demo
-
Demonstrates how to use Boilerpipe to get the main content, highlighted as HTML.
- HTMLHighlightDemo() - Constructor for class com.kohlschutter.boilerpipe.demo.HTMLHighlightDemo
- HTMLHighlighter - Class in com.kohlschutter.boilerpipe.sax
-
Highlights text blocks in an HTML document that have been marked as "content" in the corresponding
TextDocument. - HTMLHighlighter(boolean) - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- HTMLHighlighter.Implementation - Class in com.kohlschutter.boilerpipe.sax
- HTMLHighlighter.TagAction - Class in com.kohlschutter.boilerpipe.sax
- HTMLTagBalancer - Class in org.cyberneko.html
- HTMLTagBalancer() - Constructor for class org.cyberneko.html.HTMLTagBalancer
- HTMLTagBalancer.ElementEntry - Class in org.cyberneko.html
-
Structure to hold information about an element placed in buffer to be comsumed later
- HTMLTagBalancer.Info - Class in org.cyberneko.html
-
Element info for each start element.
- HTMLTagBalancer.InfoStack - Class in org.cyberneko.html
-
Unsynchronized stack of element information.
I
- I - Static variable in class org.cyberneko.html.HTMLElements
- IFRAME - Static variable in class org.cyberneko.html.HTMLElements
- ignorableWhitespace(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- ignorableWhitespace(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- ignorableWhitespace(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- ignorableWhitespace(XMLString, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Ignorable whitespace.
- IGNORE_OUTSIDE_CONTENT - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Ignore outside content.
- IgnoreBlocksAfterContentFilter - Class in com.kohlschutter.boilerpipe.filters.english
-
Marks all blocks as "non-content" that occur after blocks that have been marked
DefaultLabels.INDICATES_END_OF_TEXT. - IgnoreBlocksAfterContentFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
- IgnoreBlocksAfterContentFromEndFilter - Class in com.kohlschutter.boilerpipe.filters.english
-
Marks all blocks as "non-content" that occur after blocks that have been marked
DefaultLabels.INDICATES_END_OF_TEXT, and after any content block. - IgnoreBlocksAfterContentFromEndFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
- ILAYER - Static variable in class org.cyberneko.html.HTMLElements
- Image - Class in com.kohlschutter.boilerpipe.document
-
Represents an Image resource that is contained in the document.
- Image(String, String, String, String) - Constructor for class com.kohlschutter.boilerpipe.document.Image
- ImageExtractor - Class in com.kohlschutter.boilerpipe.sax
-
Extracts the images that are enclosed by extracted content.
- ImageExtractor() - Constructor for class com.kohlschutter.boilerpipe.sax.ImageExtractor
- ImageExtractor.Implementation - Class in com.kohlschutter.boilerpipe.sax
- ImageExtractor.TagAction - Class in com.kohlschutter.boilerpipe.sax
- ImageExtractorDemo - Class in com.kohlschutter.boilerpipe.demo
-
Demonstrates how to use Boilerpipe to get the images within the main content.
- ImageExtractorDemo() - Constructor for class com.kohlschutter.boilerpipe.demo.ImageExtractorDemo
- IMG - Static variable in class org.cyberneko.html.HTMLElements
- Implementation() - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- Implementation() - Constructor for class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- inAnchor - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- inAnchorText - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- inBody - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- INDICATES_END_OF_TEXT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- Info(HTMLElements.Element, QName) - Constructor for class org.cyberneko.html.HTMLTagBalancer.Info
-
Creates an element information object.
- Info(HTMLElements.Element, QName, XMLAttributes) - Constructor for class org.cyberneko.html.HTMLTagBalancer.Info
-
Creates an element information object.
- InfoStack() - Constructor for class org.cyberneko.html.HTMLTagBalancer.InfoStack
- inHighlight - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- inIgnorableElement - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- inIgnorableElement - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- inIgnorableElement - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- initDensities() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- INLINE - Static variable in class org.cyberneko.html.HTMLElements.Element
-
Inline element.
- InlineTagLabelAction(LabelAction) - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
- INPUT - Static variable in class org.cyberneko.html.HTMLElements
- InputSourceable - Interface in com.kohlschutter.boilerpipe.sax
-
An InputSourceable can return an arbitrary number of new
InputSources for a given document. - INS - Static variable in class org.cyberneko.html.HTMLElements
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.estimators.SimpleEstimator
-
Returns the singleton instance of
SimpleEstimator - INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
-
Returns the default instance for
PrintDebugFilter, which dumps debug information toSystem.out - INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ContentFusion
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ListAtEndFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.InvertedFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingContentFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.sax.DefaultTagActionMap
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor
- INSTANCE_200 - Static variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
- INSTANCE_EXPAND_TO_SAME_TAGLEVEL - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- INSTANCE_EXPAND_TO_SAME_TAGLEVEL_MIN_WORDS - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- INSTANCE_KEEP_TITLE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
- INSTANCE_PRE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
- INSTANCE_STRICTLY_NOT_CONTENT - Static variable in class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
- INSTANCE_TEXT - Static variable in class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
- InvertedFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Reverts the "isContent" flag for all
TextBlocks - InvertedFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.InvertedFilter
- is - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
- isBlock() - Method in class org.cyberneko.html.HTMLElements.Element
-
Returns true if this element is a block element.
- isBlockLevel - Variable in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- isClause(CharSequence) - Method in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- isContainer() - Method in class org.cyberneko.html.HTMLElements.Element
-
Returns true if this element is a container element.
- isContent - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- isContent() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- isDigit(char) - Static method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
- isEmpty() - Method in class org.cyberneko.html.HTMLElements.Element
-
Returns true if this element is an empty element.
- ISINDEX - Static variable in class org.cyberneko.html.HTMLElements
- isInline() - Method in class org.cyberneko.html.HTMLElements.Element
-
Returns true if this element is an inline element.
- isLowQuality(TextDocumentStatistics, TextDocumentStatistics) - Method in class com.kohlschutter.boilerpipe.estimators.SimpleEstimator
-
Given the statistics of the document before and after applying the
BoilerpipeExtractor, can we regard the extraction quality (too) low? Works well withDefaultExtractor,ArticleExtractorand others. - isOutputHighlightOnly() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
If true, only HTML enclosed within highlighted content will be returned
- isParent(HTMLElements.Element) - Method in class org.cyberneko.html.HTMLElements.Element
-
Indicates if the provided element is an accepted parent of current element
- isSpecial() - Method in class org.cyberneko.html.HTMLElements.Element
-
Returns true if this element is special -- if its content should be parsed ignoring markup.
- isWord(String) - Static method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
K
- KBD - Static variable in class org.cyberneko.html.HTMLElements
- KEEP_EVERYTHING_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
-
Dummy Extractor; should return the input text.
- KeepEverythingExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
Marks everything as content.
- KeepEverythingExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor
- KeepEverythingWithMinKWordsExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
A full-text extractor which extracts the largest text component of a page.
- KeepEverythingWithMinKWordsExtractor(int) - Constructor for class com.kohlschutter.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
- KeepLargestBlockFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Keeps the largest
TextBlockonly (by the number of words). - KeepLargestBlockFilter(boolean, int) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- KeepLargestFulltextBlockFilter - Class in com.kohlschutter.boilerpipe.filters.english
-
Keeps the largest
TextBlockonly (by the number of words). - KeepLargestFulltextBlockFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
- KEYGEN - Static variable in class org.cyberneko.html.HTMLElements
L
- LABEL - Static variable in class org.cyberneko.html.HTMLElements
- LabelAction - Class in com.kohlschutter.boilerpipe.labels
-
Helps adding labels to
TextBlocks. - LabelAction(String...) - Constructor for class com.kohlschutter.boilerpipe.labels.LabelAction
- LabelFusion - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Fuses adjacent blocks if their labels are equal.
- LabelFusion() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
-
Creates a new
LabelFusioninstance. - labelPrefix - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
- labels - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- labels - Variable in class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
- labels - Variable in class com.kohlschutter.boilerpipe.filters.simple.LabelToContentFilter
- labels - Variable in class com.kohlschutter.boilerpipe.labels.LabelAction
- labelStack - Variable in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- labelStacks - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- LabelToBoilerplateFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Marks all blocks that contain a given label as "boilerplate".
- LabelToBoilerplateFilter(String...) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
- LabelToContentFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Marks all blocks that contain a given label as "content".
- LabelToContentFilter(String...) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.LabelToContentFilter
- labelToKeep - Variable in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
- LargeBlockSameTagLevelToContentFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Marks all blocks as content that: are on the same tag-level as very likely main content (usually the level of the largest block) have a significant number of words, currently: at least 100
- LargeBlockSameTagLevelToContentFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
- LARGEST_CONTENT_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
-
Like
DefaultExtractor, but keeps the largest text block only. - LargestContentExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
A full-text extractor which extracts the largest text component of a page.
- LargestContentExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
- lastEndTag - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- lastEvent - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- lastStartTag - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- LAYER - Static variable in class org.cyberneko.html.HTMLElements
- LEGEND - Static variable in class org.cyberneko.html.HTMLElements
- LI - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- LI - Static variable in class org.cyberneko.html.HTMLElements
- LINK - Static variable in class org.cyberneko.html.HTMLElements
- linkDensity - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- linksBuffer - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- linksHighlight - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- ListAtEndFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Marks nested list-item blocks after the end of the main content.
- ListAtEndFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ListAtEndFilter
- LISTING - Static variable in class org.cyberneko.html.HTMLElements
- lostText_ - Variable in class org.cyberneko.html.HTMLTagBalancer
M
- main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.HTMLHighlightDemo
- main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.ImageExtractorDemo
- main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.Oneliner
- main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.UsingSAX
- MAP - Static variable in class org.cyberneko.html.HTMLElements
- MarkEverythingBoilerplateFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Marks all blocks as boilerplate.
- MarkEverythingBoilerplateFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
- MarkEverythingContentFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Marks all blocks as content.
- MarkEverythingContentFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingContentFilter
- MARKUP_PREFIX - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- markupLabelsOnly(Set<String>) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
- MarkupTagAction - Class in com.kohlschutter.boilerpipe.sax
-
Assigns labels for element CSS classes and ids to the corresponding
TextBlock. - MarkupTagAction(boolean) - Constructor for class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- MARQUEE - Static variable in class org.cyberneko.html.HTMLElements
- MAX_DISTANCE_1 - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- MAX_DISTANCE_1_CONTENT_ONLY - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- MAX_DISTANCE_1_CONTENT_ONLY_SAME_TAGLEVEL - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- MAX_DISTANCE_1_SAME_TAGLEVEL - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- maxBlocksDistance - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- meetsCondition(TextBlock) - Method in interface com.kohlschutter.boilerpipe.conditions.TextBlockCondition
-
Returns
trueiff the givenTextBlocktb meets the defined condition. - MENU - Static variable in class org.cyberneko.html.HTMLElements
- mergeNext(TextBlock) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- META - Static variable in class org.cyberneko.html.HTMLElements
- MIGHT_BE_CONTENT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- MinClauseWordsFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5).
- MinClauseWordsFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- MinClauseWordsFilter(int, boolean) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- MinFulltextWordsFilter - Class in com.kohlschutter.boilerpipe.filters.english
-
Keeps only those content blocks which contain at least k full-text words (measured by
HeuristicFilterBase.getNumFullTextWords(TextBlock)). - MinFulltextWordsFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
- minNumWords - Variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
- minWords - Variable in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
- minWords - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- minWords - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- minWords - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinWordsFilter
- MinWordsFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Keeps only those content blocks which contain at least k words.
- MinWordsFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MinWordsFilter
- modifyName(String, short) - Static method in class org.cyberneko.html.HTMLTagBalancer
-
Modifies the given name based on the specified mode.
- MULTICOL - Static variable in class org.cyberneko.html.HTMLElements
N
- name - Variable in class org.cyberneko.html.HTMLElements.Element
-
The element name.
- name_ - Variable in class org.cyberneko.html.HTMLTagBalancer.ElementEntry
- NAMES_ATTRS - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Modify HTML attribute names: { "upper", "lower", "default" }.
- NAMES_ELEMS - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Modify HTML element names: { "upper", "lower", "default" }.
- NAMES_LOWERCASE - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Lowercase HTML names.
- NAMES_MATCH - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Match HTML element names.
- NAMES_NO_CHANGE - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Don't modify HTML names.
- NAMES_UPPERCASE - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Uppercase HTML names.
- NAMESPACES - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Namespaces.
- nestable - Variable in class org.cyberneko.html.HTMLElements.Element
-
If set to true, then this element may not be nested, example: "A"
- newExtractingInstance() - Static method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Creates a new
HTMLHighlighter, which is set-up to return only the extracted HTML text, including enclosed markup. - newHighlightingInstance() - Static method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Creates a new
HTMLHighlighter, which is set-up to return the full HTML text, with the extracted text portion highlighted. - NEXTID - Static variable in class org.cyberneko.html.HTMLElements
- NO_SUCH_ELEMENT - Static variable in class org.cyberneko.html.HTMLElements
-
No such element.
- NOBR - Static variable in class org.cyberneko.html.HTMLElements
- NOEMBED - Static variable in class org.cyberneko.html.HTMLElements
- NOFRAMES - Static variable in class org.cyberneko.html.HTMLElements
- NOLAYER - Static variable in class org.cyberneko.html.HTMLElements
- NOSCRIPT - Static variable in class org.cyberneko.html.HTMLElements
- notifyDiscardedEndElement(QName, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Notifies the tagBalancingListener (if any) of an ignored end element
- notifyDiscardedStartElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Notifies the tagBalancingListener (if any) of an ignored start element
- nullTrim(String) - Static method in class com.kohlschutter.boilerpipe.document.Image
- numBlocks - Variable in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
- numFullTextWords - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- numWords - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- numWords - Variable in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
- numWordsInAnchorText - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- numWordsInWrappedLines - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- NumWordsRulesClassifier - Class in com.kohlschutter.boilerpipe.filters.english
-
Classifies
TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block. - NumWordsRulesClassifier() - Constructor for class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
- NumWordsRulesExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
A quite generic full-text extractor solely based upon the number of words per block (the current, the previous and the next block).
- NumWordsRulesExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
- numWrappedLines - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
O
- OBJECT - Static variable in class org.cyberneko.html.HTMLElements
- offsetBlocks - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- offsetBlocksEnd - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- offsetBlocksStart - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- OL - Static variable in class org.cyberneko.html.HTMLElements
- Oneliner - Class in com.kohlschutter.boilerpipe.demo
-
Demonstrates how to use Boilerpipe to get the main content as plain text.
- Oneliner() - Constructor for class com.kohlschutter.boilerpipe.demo.Oneliner
- OPTGROUP - Static variable in class org.cyberneko.html.HTMLElements
- OPTION - Static variable in class org.cyberneko.html.HTMLElements
- org.cyberneko.html - package org.cyberneko.html
- out - Variable in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
- outputHighlightOnly - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
P
- P - Static variable in class org.cyberneko.html.HTMLElements
- PARAM - Static variable in class org.cyberneko.html.HTMLElements
- parent - Variable in class org.cyberneko.html.HTMLElements.Element
-
Parent elements.
- parentCodes - Variable in class org.cyberneko.html.HTMLElements.Element
-
Parent elements.
- PAT_CHARSET - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLFetcher
- PAT_CLAUSE_DELIMITER - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- PAT_FONT_SIZE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
- PAT_NOT_WORD_BOUNDARY - Static variable in class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
- PAT_NUM - Static variable in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- PAT_REMOVE_CHARACTERS - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- PAT_SUPER_TAG - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- PAT_TAG_NO_TEXT - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- PAT_VALID_WORD_CHARACTER - Static variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- PAT_WHITESPACE - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- PAT_WORD_BOUNDARY - Static variable in class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
- PATTERNS_SHORT - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
- peek() - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
-
Peeks at the top of the stack.
- PLAINTEXT - Static variable in class org.cyberneko.html.HTMLElements
- pop() - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
-
Pops the top item off of the stack.
- postHighlight - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- potentialTitles - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- PRE - Static variable in class org.cyberneko.html.HTMLElements
- preHighlight - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- PrintDebugFilter - Class in com.kohlschutter.boilerpipe.filters.debug
-
Prints debug information about the current state of the TextDocument.
- PrintDebugFilter(PrintWriter) - Constructor for class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
-
Creates a new instance of
PrintDebugFilter. - process(TextDocument) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeFilter
-
Processes the given document
doc. - process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ContentFusion
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ListAtEndFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.InvertedFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.LabelToContentFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingContentFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MinWordsFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
- process(TextDocument, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Processes the given
TextDocumentand the original HTML text (as a String). - process(TextDocument, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
-
Processes the given
TextDocumentand the original HTML text (as a String). - process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Processes the given
TextDocumentand the original HTML text (as anInputSource). - process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
-
Processes the given
TextDocumentand the original HTML text (as anInputSource). - process(URL, BoilerpipeExtractor) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Fetches the given
URLusingHTMLFetcherand processes the retrieved HTML using the specifiedBoilerpipeExtractor. - process(URL, BoilerpipeExtractor) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
-
Fetches the given
URLusingHTMLFetcherand processes the retrieved HTML using the specifiedBoilerpipeExtractor. - processingInstruction(String, String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- processingInstruction(String, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- processingInstruction(String, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- processingInstruction(String, XMLString, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Processing instruction.
- push(HTMLTagBalancer.Info) - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
-
Pushes element information onto the stack.
Q
- Q - Static variable in class org.cyberneko.html.HTMLElements
- qname - Variable in class org.cyberneko.html.HTMLTagBalancer.Info
-
The element qualified name.
R
- RB - Static variable in class org.cyberneko.html.HTMLElements
- RBC - Static variable in class org.cyberneko.html.HTMLElements
- RECOGNIZED_FEATURES - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Recognized features.
- RECOGNIZED_FEATURES_DEFAULTS - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Recognized features defaults.
- RECOGNIZED_PROPERTIES - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Recognized properties.
- RECOGNIZED_PROPERTIES_DEFAULTS - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Recognized properties defaults.
- recycle() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
Recycles this instance.
- removeLabel(String) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- REPORT_ERRORS - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Report errors.
- reset(XMLComponentManager) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Resets the component.
- RP - Static variable in class org.cyberneko.html.HTMLElements
- RT - Static variable in class org.cyberneko.html.HTMLElements
- RTC - Static variable in class org.cyberneko.html.HTMLElements
- RUBY - Static variable in class org.cyberneko.html.HTMLElements
S
- S - Static variable in class org.cyberneko.html.HTMLElements
- sameTagLevelOnly - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- SAMP - Static variable in class org.cyberneko.html.HTMLElements
- sbLastWasWhitespace - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- SCRIPT - Static variable in class org.cyberneko.html.HTMLElements
- SELECT - Static variable in class org.cyberneko.html.HTMLElements
- serialVersionUID - Static variable in exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
- serialVersionUID - Static variable in class com.kohlschutter.boilerpipe.sax.DefaultTagActionMap
- serialVersionUID - Static variable in class com.kohlschutter.boilerpipe.sax.TagActionMap
- setContentHandler(BoilerpipeHTMLContentHandler) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
- setContentHandler(ContentHandler) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
- setDocumentHandler(XMLDocumentHandler) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Sets the document handler.
- setDocumentLocator(Locator) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- setDocumentLocator(Locator) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- setDocumentLocator(Locator) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- setDocumentSource(XMLDocumentSource) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Sets the document source.
- setExtraStyleSheet(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Sets the extra stylesheet definition that will be inserted in the HEAD element.
- setFeature(String, boolean) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Sets a feature.
- setIsContent(boolean) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- setOutputHighlightOnly(boolean) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML document.
- setPostHighlight(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Sets the string that will be inserted after any highlighted HTML block.
- setPreHighlight(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Sets the string that will be inserted prior to any highlighted HTML block.
- setProperty(String, Object) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Sets a property.
- setTagAction(String, TagAction) - Method in class com.kohlschutter.boilerpipe.sax.TagActionMap
-
Sets a particular
TagActionfor a given tag. - setTagBalancingListener(HTMLTagBalancingListener) - Method in class org.cyberneko.html.HTMLTagBalancer
- setTagLevel(int) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- setTagWhitelist(Map<String, Set<String>>) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- setTitle(String) - Method in class com.kohlschutter.boilerpipe.document.TextDocument
-
Updates the "main" title for this document.
- setTitle(String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- SimpleBlockFusionProcessor - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Merges two subsequent blocks if their text densities are equal.
- SimpleBlockFusionProcessor() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
- SimpleEstimator - Class in com.kohlschutter.boilerpipe.estimators
-
Estimates the "goodness" of a
BoilerpipeExtractoron a given document. - SimpleEstimator() - Constructor for class com.kohlschutter.boilerpipe.estimators.SimpleEstimator
- size - Variable in class org.cyberneko.html.HTMLElements.ElementList
-
The size of the list.
- skippedEntity(String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- skippedEntity(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- skippedEntity(String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- SMALL - Static variable in class org.cyberneko.html.HTMLElements
- SOUND - Static variable in class org.cyberneko.html.HTMLElements
- SPACER - Static variable in class org.cyberneko.html.HTMLElements
- SPAN - Static variable in class org.cyberneko.html.HTMLElements
- SPECIAL - Static variable in class org.cyberneko.html.HTMLElements.Element
-
Special element.
- SplitParagraphBlocksFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Splits TextBlocks at paragraph boundaries.
- SplitParagraphBlocksFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
- src - Variable in class com.kohlschutter.boilerpipe.document.Image
- start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
- start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
- start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
- start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in interface com.kohlschutter.boilerpipe.sax.TagAction
- START_TAG - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
- startCDATA(Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Start CDATA section.
- startDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- startDocument() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- startDocument() - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- startDocument(XMLLocator, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Start document.
- startDocument(XMLLocator, String, NamespaceContext, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Start document.
- startElement(String, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- startElement(String, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- startElement(String, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- startElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Start element.
- startGeneralEntity(String, XMLResourceIdentifier, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Start entity.
- startPrefixMapping(String, String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- startPrefixMapping(String, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- startPrefixMapping(String, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- startPrefixMapping(String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Start prefix mapping.
- startsWithNumber(String, int, String...) - Static method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
-
Checks whether the given text t starts with a sequence of digits, followed by one of the given strings.
- STRICTLY_NOT_CONTENT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- STRIKE - Static variable in class org.cyberneko.html.HTMLElements
- STRONG - Static variable in class org.cyberneko.html.HTMLElements
- STYLE - Static variable in class org.cyberneko.html.HTMLElements
- SUB - Static variable in class org.cyberneko.html.HTMLElements
- SUP - Static variable in class org.cyberneko.html.HTMLElements
- SurroundingToContentFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Marks blocks as "content" if their preceding and following blocks are both already marked "content", and the given
TextBlockConditionis met. - SurroundingToContentFilter(TextBlockCondition) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
- SYNTHESIZED_ITEM - Static variable in class org.cyberneko.html.HTMLTagBalancer
-
Synthesized event info item.
- synthesizedAugs() - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns an augmentations object with a synthesized item added.
T
- t1 - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
- t2 - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
- TA_ANCHOR_TEXT - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Marks this tag as "anchor" (this should usually only be set for the
<A>tag). - TA_BLOCK_LEVEL - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Explicitly marks this tag a simple "block-level" element, which always generates whitespace
- TA_BODY - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Marks this tag the body element (this should usually only be set for the
<BODY>tag). - TA_FONT - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Special TagAction for the
<FONT>tag, which keeps track of the absolute and relative font size. - TA_HEAD - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- TA_IGNORABLE_ELEMENT - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Marks this tag as "ignorable", i.e.
- TA_IGNORABLE_ELEMENT - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- TA_IGNORABLE_ELEMENT - Static variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor
- TA_INLINE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Deprecated.Use
CommonTagActions.TA_INLINE_WHITESPACEinstead - TA_INLINE_NO_WHITESPACE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Marks this tag a simple "inline" element, which neither generates whitespace, nor a new block.
- TA_INLINE_WHITESPACE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Marks this tag a simple "inline" element, which generates whitespace, but no new block.
- TABLE - Static variable in class org.cyberneko.html.HTMLElements
- TAG_ACTIONS - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- TAG_ACTIONS - Static variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor
- TagAction - Interface in com.kohlschutter.boilerpipe.sax
-
Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.
- TagAction() - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
- TagAction() - Constructor for class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
- TagActionMap - Class in com.kohlschutter.boilerpipe.sax
-
Base class for definition a set of
TagActions that are to be used for the HTML parsing process. - TagActionMap() - Constructor for class com.kohlschutter.boilerpipe.sax.TagActionMap
- tagActions - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- tagBalancingListener - Variable in class org.cyberneko.html.HTMLTagBalancer
- tagLevel - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- tagLevel - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- tagWhitelist - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- TBODY - Static variable in class org.cyberneko.html.HTMLElements
- TD - Static variable in class org.cyberneko.html.HTMLElements
- TerminatingBlocksFinder - Class in com.kohlschutter.boilerpipe.filters.english
-
Finds blocks which are potentially indicating the end of an article text and marks them with
DefaultLabels.INDICATES_END_OF_TEXT. - TerminatingBlocksFinder() - Constructor for class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
- text - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- TEXTAREA - Static variable in class org.cyberneko.html.HTMLElements
- TextBlock - Class in com.kohlschutter.boilerpipe.document
-
Describes a block of text.
- TextBlock(String) - Constructor for class com.kohlschutter.boilerpipe.document.TextBlock
- TextBlock(String, BitSet, int, int, int, int, int) - Constructor for class com.kohlschutter.boilerpipe.document.TextBlock
- TextBlockCondition - Interface in com.kohlschutter.boilerpipe.conditions
-
Evaluates whether a given
TextBlockmeets a certain condition. - textBlocks - Variable in class com.kohlschutter.boilerpipe.document.TextDocument
- textBlocks - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- textBuffer - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- textDecl(String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Text declaration.
- textDensity - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- TextDocument - Class in com.kohlschutter.boilerpipe.document
-
A text document, consisting of one or more
TextBlocks. - TextDocument(String, List<TextBlock>) - Constructor for class com.kohlschutter.boilerpipe.document.TextDocument
-
Creates a new
TextDocumentwith givenTextBlocks and given title. - TextDocument(List<TextBlock>) - Constructor for class com.kohlschutter.boilerpipe.document.TextDocument
-
Creates a new
TextDocumentwith givenTextBlocks, and no title. - TextDocumentStatistics - Class in com.kohlschutter.boilerpipe.document
-
Provides shallow statistics on a given
TextDocument - TextDocumentStatistics(TextDocument, boolean) - Constructor for class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
-
Computes statistics on a given
TextDocument. - textElementIdx - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- TFOOT - Static variable in class org.cyberneko.html.HTMLElements
- TH - Static variable in class org.cyberneko.html.HTMLElements
- THEAD - Static variable in class org.cyberneko.html.HTMLElements
- title - Variable in class com.kohlschutter.boilerpipe.document.TextDocument
- title - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- TITLE - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- TITLE - Static variable in class org.cyberneko.html.HTMLElements
- toInputSource() - Method in class com.kohlschutter.boilerpipe.sax.HTMLDocument
- toInputSource() - Method in interface com.kohlschutter.boilerpipe.sax.InputSourceable
- tokenBuffer - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- tokenize(CharSequence) - Static method in class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
-
Tokenizes the text and returns an array of tokens.
- top - Variable in class org.cyberneko.html.HTMLTagBalancer.InfoStack
-
The top of the stack.
- toString() - Method in class com.kohlschutter.boilerpipe.document.Image
- toString() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- toString() - Method in class com.kohlschutter.boilerpipe.labels.LabelAction
- toString() - Method in class org.cyberneko.html.HTMLElements.Element
-
Provides a simple representation to make debugging easier
- toString() - Method in class org.cyberneko.html.HTMLTagBalancer.Info
-
Simple representation to make debugging easier
- toString() - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
-
Simple representation to make debugging easier
- toTextDocument() - Method in interface com.kohlschutter.boilerpipe.BoilerpipeDocumentSource
- toTextDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
Returns a
TextDocumentcontaining the extractedTextBlocks. - toTextDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
-
Returns a
TextDocumentcontaining the extractedTextBlocks. - TR - Static variable in class org.cyberneko.html.HTMLElements
- TrailingHeadlineToBoilerplateFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Marks trailing headlines (
TextBlocks that have the labelDefaultLabels.HEADING) as boilerplate. - TrailingHeadlineToBoilerplateFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
- TT - Static variable in class org.cyberneko.html.HTMLElements
U
- U - Static variable in class org.cyberneko.html.HTMLElements
- UL - Static variable in class org.cyberneko.html.HTMLElements
- UnicodeTokenizer - Class in com.kohlschutter.boilerpipe.util
-
Tokenizes text according to Unicode word boundaries and strips off non-word characters.
- UnicodeTokenizer() - Constructor for class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
- UNKNOWN - Static variable in class org.cyberneko.html.HTMLElements
- UsingSAX - Class in com.kohlschutter.boilerpipe.demo
-
Demonstrates how to use Boilerpipe when working with
InputSources. - UsingSAX() - Constructor for class com.kohlschutter.boilerpipe.demo.UsingSAX
V
- valueOf(String) - Static method in enum com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
-
Returns an array containing the constants of this enum type, in the order they are declared.
- VAR - Static variable in class org.cyberneko.html.HTMLElements
- VERY_LIKELY_CONTENT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
W
- WBR - Static variable in class org.cyberneko.html.HTMLElements
- WHITESPACE - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
- width - Variable in class com.kohlschutter.boilerpipe.document.Image
X
- XML - Static variable in class org.cyberneko.html.HTMLElements
- xmlDecl(String, String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
XML declaration.
- xmlEncode(String) - Static method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- XMP - Static variable in class org.cyberneko.html.HTMLElements
All Classes All Packages