Package com.aowagie.text.pdf.parser
Class PdfTextExtractor
- java.lang.Object
-
- com.aowagie.text.pdf.parser.PdfTextExtractor
-
class PdfTextExtractor extends java.lang.ObjectExtracts text from a PDF file.- Since:
- 2.1.4
-
-
Field Summary
Fields Modifier and Type Field Description private SimpleTextExtractingPdfContentStreamProcessorextractionProcessorThe processor that will extract the text.private PdfReaderreaderThe PdfReader that holds the PDF file.
-
Constructor Summary
Constructors Constructor Description PdfTextExtractor(PdfReader reader)Creates a new Text Extractor object.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private byte[]getContentBytesForPage(int pageNum)Gets the content stream of a page.java.lang.StringgetTextFromPage(int page)Gets the text from a page.
-
-
-
Field Detail
-
reader
private final PdfReader reader
The PdfReader that holds the PDF file.
-
extractionProcessor
private final SimpleTextExtractingPdfContentStreamProcessor extractionProcessor
The processor that will extract the text.
-
-
Constructor Detail
-
PdfTextExtractor
public PdfTextExtractor(PdfReader reader)
Creates a new Text Extractor object.- Parameters:
reader- the reader with the PDF
-
-
Method Detail
-
getContentBytesForPage
private byte[] getContentBytesForPage(int pageNum) throws java.io.IOExceptionGets the content stream of a page.- Parameters:
pageNum- the page number of page you want get the content stream from- Returns:
- a byte array with the content stream of a page
- Throws:
java.io.IOException
-
getTextFromPage
public java.lang.String getTextFromPage(int page) throws java.io.IOExceptionGets the text from a page.- Parameters:
page- the page number of the page- Returns:
- a String with the content as plain text (without PDF syntax)
- Throws:
java.io.IOException
-
-