Uses of Class
com.kohlschutter.boilerpipe.document.TextDocument
Packages that use TextDocument
Package
Description
The Boilerpipe top-level package.
The Boilerpipe document model.
Some standard extractors (i.e., completely piped BoilerpipeFilters)
These BoilerpipeFilters have only been tested on English text.
These BoilerpipeFilters are pure heuristics.
These BoilerpipeFilters are straight-forward and probably not really specific to English.
Classes related to parsing and producing HTML from/to Boilerpipe TextDocuments.
-
Uses of TextDocument in com.kohlschutter.boilerpipe
Methods in com.kohlschutter.boilerpipe that return TextDocumentModifier and TypeMethodDescriptionBoilerpipeInput.getTextDocument()Returns (somehow) aTextDocument.BoilerpipeDocumentSource.toTextDocument()Methods in com.kohlschutter.boilerpipe with parameters of type TextDocumentModifier and TypeMethodDescriptionBoilerpipeExtractor.getText(TextDocument doc) Extracts text from the givenTextDocumentobject.booleanBoilerpipeFilter.process(TextDocument doc) Processes the given documentdoc. -
Uses of TextDocument in com.kohlschutter.boilerpipe.document
Methods in com.kohlschutter.boilerpipe.document that return TextDocumentConstructors in com.kohlschutter.boilerpipe.document with parameters of type TextDocumentModifierConstructorDescriptionTextDocumentStatistics(TextDocument doc, boolean contentOnly) Computes statistics on a givenTextDocument. -
Uses of TextDocument in com.kohlschutter.boilerpipe.extractors
Methods in com.kohlschutter.boilerpipe.extractors with parameters of type TextDocumentModifier and TypeMethodDescriptionExtractorBase.getText(TextDocument doc) Extracts text from the givenTextDocumentobject.booleanArticleExtractor.process(TextDocument doc) booleanArticleSentencesExtractor.process(TextDocument doc) booleanCanolaExtractor.process(TextDocument doc) booleanDefaultExtractor.process(TextDocument doc) booleanKeepEverythingExtractor.process(TextDocument doc) booleanKeepEverythingWithMinKWordsExtractor.process(TextDocument doc) booleanLargestContentExtractor.process(TextDocument doc) booleanNumWordsRulesExtractor.process(TextDocument doc) -
Uses of TextDocument in com.kohlschutter.boilerpipe.filters.debug
Methods in com.kohlschutter.boilerpipe.filters.debug with parameters of type TextDocument -
Uses of TextDocument in com.kohlschutter.boilerpipe.filters.english
Methods in com.kohlschutter.boilerpipe.filters.english with parameters of type TextDocumentModifier and TypeMethodDescriptionbooleanDensityRulesClassifier.process(TextDocument doc) booleanIgnoreBlocksAfterContentFilter.process(TextDocument doc) booleanIgnoreBlocksAfterContentFromEndFilter.process(TextDocument doc) booleanKeepLargestFulltextBlockFilter.process(TextDocument doc) booleanMinFulltextWordsFilter.process(TextDocument doc) booleanNumWordsRulesClassifier.process(TextDocument doc) booleanTerminatingBlocksFinder.process(TextDocument doc) -
Uses of TextDocument in com.kohlschutter.boilerpipe.filters.heuristics
Methods in com.kohlschutter.boilerpipe.filters.heuristics with parameters of type TextDocumentModifier and TypeMethodDescriptionbooleanAddPrecedingLabelsFilter.process(TextDocument doc) booleanArticleMetadataFilter.process(TextDocument doc) booleanBlockProximityFusion.process(TextDocument doc) booleanContentFusion.process(TextDocument doc) booleanDocumentTitleMatchClassifier.process(TextDocument doc) booleanExpandTitleToContentFilter.process(TextDocument doc) booleanKeepLargestBlockFilter.process(TextDocument doc) booleanLabelFusion.process(TextDocument doc) booleanLargeBlockSameTagLevelToContentFilter.process(TextDocument doc) booleanListAtEndFilter.process(TextDocument doc) booleanSimpleBlockFusionProcessor.process(TextDocument doc) booleanTrailingHeadlineToBoilerplateFilter.process(TextDocument doc) -
Uses of TextDocument in com.kohlschutter.boilerpipe.filters.simple
Methods in com.kohlschutter.boilerpipe.filters.simple with parameters of type TextDocumentModifier and TypeMethodDescriptionbooleanBoilerplateBlockFilter.process(TextDocument doc) booleanInvertedFilter.process(TextDocument doc) booleanLabelToBoilerplateFilter.process(TextDocument doc) booleanLabelToContentFilter.process(TextDocument doc) booleanMarkEverythingBoilerplateFilter.process(TextDocument doc) booleanMarkEverythingContentFilter.process(TextDocument doc) booleanMinClauseWordsFilter.process(TextDocument doc) booleanMinWordsFilter.process(TextDocument doc) booleanSplitParagraphBlocksFilter.process(TextDocument doc) booleanSurroundingToContentFilter.process(TextDocument doc) -
Uses of TextDocument in com.kohlschutter.boilerpipe.sax
Methods in com.kohlschutter.boilerpipe.sax that return TextDocumentModifier and TypeMethodDescriptionBoilerpipeSAXInput.getTextDocument()Retrieves theTextDocumentusing a default HTML parser.BoilerpipeSAXInput.getTextDocument(BoilerpipeHTMLParser parser) Retrieves theTextDocumentusing the given HTML parser.BoilerpipeHTMLContentHandler.toTextDocument()Returns aTextDocumentcontaining the extractedTextBlocks.BoilerpipeHTMLParser.toTextDocument()Returns aTextDocumentcontaining the extractedTextBlocks.Methods in com.kohlschutter.boilerpipe.sax with parameters of type TextDocumentModifier and TypeMethodDescription(package private) voidHTMLHighlighter.Implementation.process(TextDocument doc, InputSource is) HTMLHighlighter.process(TextDocument doc, String origHTML) Processes the givenTextDocumentand the original HTML text (as a String).HTMLHighlighter.process(TextDocument doc, InputSource is) Processes the givenTextDocumentand the original HTML text (as anInputSource).(package private) voidImageExtractor.Implementation.process(TextDocument doc, InputSource is) ImageExtractor.process(TextDocument doc, String origHTML) Processes the givenTextDocumentand the original HTML text (as a String).ImageExtractor.process(TextDocument doc, InputSource is) Processes the givenTextDocumentand the original HTML text (as anInputSource).