Package org.apache.pdfbox.examples.util
Class PDFHighlighter
- java.lang.Object
-
- org.apache.pdfbox.contentstream.PDFStreamEngine
-
- org.apache.pdfbox.text.LegacyPDFStreamEngine
-
- org.apache.pdfbox.text.PDFTextStripper
-
- org.apache.pdfbox.examples.util.PDFHighlighter
-
public class PDFHighlighter extends PDFTextStripper
Highlighting of words in a PDF document with an XML file.- See Also:
- Adobe Highlight File Format
-
-
Field Summary
Fields Modifier and Type Field Description private static java.nio.charset.CharsetENCODINGprivate java.io.WriterhighlighterOutputprivate java.lang.String[]searchedWordsprivate java.io.ByteArrayOutputStreamtextOSprivate java.io.WritertextWriter-
Fields inherited from class org.apache.pdfbox.text.PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output
-
-
Constructor Summary
Constructors Constructor Description PDFHighlighter()Default constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidendPage(PDPage pdPage)End a page.voidgenerateXMLHighlight(PDDocument pdDocument, java.lang.String[] sWords, java.io.Writer xmlOutput)Generate an XML highlight string based on the PDF.voidgenerateXMLHighlight(PDDocument pdDocument, java.lang.String highlightWord, java.io.Writer xmlOutput)Generate an XML highlight string based on the PDF.static voidmain(java.lang.String[] args)Command line application.private static voidusage()-
Methods inherited from class org.apache.pdfbox.text.PDFTextStripper
beginMarkedContentSequence, endArticle, endDocument, endMarkedContentSequence, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIgnoreContentStreamSpaceGlyphs, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, processTextPosition, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIgnoreContentStreamSpaceGlyphs, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparator
-
Methods inherited from class org.apache.pdfbox.text.LegacyPDFStreamEngine
computeFontHeight, showGlyph
-
Methods inherited from class org.apache.pdfbox.contentstream.PDFStreamEngine
addOperator, applyTextAdjustment, beginText, decreaseLevel, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, isShouldProcessColorOperators, markedContentPoint, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
-
-
-
-
Method Detail
-
generateXMLHighlight
public void generateXMLHighlight(PDDocument pdDocument, java.lang.String highlightWord, java.io.Writer xmlOutput) throws java.io.IOException
Generate an XML highlight string based on the PDF.- Parameters:
pdDocument- The PDF to find words in.highlightWord- The word to search for.xmlOutput- The resulting output xml file.- Throws:
java.io.IOException- If there is an error reading from the PDF, or writing to the XML.
-
generateXMLHighlight
public void generateXMLHighlight(PDDocument pdDocument, java.lang.String[] sWords, java.io.Writer xmlOutput) throws java.io.IOException
Generate an XML highlight string based on the PDF.- Parameters:
pdDocument- The PDF to find words in.sWords- The words to search for.xmlOutput- The resulting output xml file.- Throws:
java.io.IOException- If there is an error reading from the PDF, or writing to the XML.
-
endPage
protected void endPage(PDPage pdPage) throws java.io.IOException
End a page. Default implementation is to do nothing. Subclasses may provide additional information.- Overrides:
endPagein classPDFTextStripper- Parameters:
pdPage- The page we are about to process.- Throws:
java.io.IOException- If there is any error writing to the stream.
-
main
public static void main(java.lang.String[] args) throws java.io.IOExceptionCommand line application.- Parameters:
args- The command line arguments to the application.- Throws:
java.io.IOException- If there is an error generating the highlight file.
-
usage
private static void usage()
-
-