Class PdfTextExtractor
java.lang.Object
com.aowagie.text.pdf.parser.PdfTextExtractor
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final SimpleTextExtractingPdfContentStreamProcessorThe processor that will extract the text.private final PdfReaderThe PdfReader that holds the PDF file. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate byte[]getContentBytesForPage(int pageNum) Gets the content stream of a page.getTextFromPage(int page) Gets the text from a page.
-
Field Details
-
reader
The PdfReader that holds the PDF file. -
extractionProcessor
The processor that will extract the text.
-
-
Constructor Details
-
PdfTextExtractor
Creates a new Text Extractor object.- Parameters:
reader- the reader with the PDF
-
-
Method Details
-
getContentBytesForPage
Gets the content stream of a page.- Parameters:
pageNum- the page number of page you want get the content stream from- Returns:
- a byte array with the content stream of a page
- Throws:
IOException
-
getTextFromPage
Gets the text from a page.- Parameters:
page- the page number of the page- Returns:
- a String with the content as plain text (without PDF syntax)
- Throws:
IOException
-