Package org.apache.pdfbox.tools
Class FilteredText2Markdown
java.lang.Object
org.apache.pdfbox.contentstream.PDFStreamEngine
org.apache.pdfbox.text.LegacyPDFStreamEngine
org.apache.pdfbox.text.PDFTextStripper
org.apache.pdfbox.tools.PDFText2Markdown
org.apache.pdfbox.tools.FilteredText2Markdown
PDFText2Markdown that only processes glyphs that have angle 0.
-
Field Summary
Fields inherited from class org.apache.pdfbox.text.PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidThis will process a TextPosition object and add the text to the list of characters on a page.Methods inherited from class org.apache.pdfbox.tools.PDFText2Markdown
endArticle, startArticle, writeParagraphEnd, writeString, writeStringMethods inherited from class org.apache.pdfbox.text.PDFTextStripper
beginMarkedContentSequence, endDocument, endMarkedContentSequence, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIgnoreContentStreamSpaceGlyphs, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIgnoreContentStreamSpaceGlyphs, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageStart, writeParagraphSeparator, writeParagraphStart, writeText, writeWordSeparatorMethods inherited from class org.apache.pdfbox.text.LegacyPDFStreamEngine
computeFontHeight, showGlyphMethods inherited from class org.apache.pdfbox.contentstream.PDFStreamEngine
addOperator, applyTextAdjustment, beginText, decreaseLevel, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, isShouldProcessColorOperators, markedContentPoint, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
-
Constructor Details
-
FilteredText2Markdown
FilteredText2Markdown()
-
-
Method Details
-
processTextPosition
Description copied from class:PDFTextStripperThis will process a TextPosition object and add the text to the list of characters on a page. It takes care of overlapping text.- Overrides:
processTextPositionin classPDFTextStripper- Parameters:
text- The text to process.
-