Package com.itextpdf.text.pdf.parser
Class PdfTextExtractor
- java.lang.Object
-
- com.itextpdf.text.pdf.parser.PdfTextExtractor
-
public final class PdfTextExtractor extends java.lang.ObjectExtracts text from a PDF file.- Since:
- 2.1.4
-
-
Constructor Summary
Constructors Modifier Constructor Description privatePdfTextExtractor()This class only contains static methods.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.lang.StringgetTextFromPage(PdfReader reader, int pageNumber)Extract text from a specified page using the default strategy.static java.lang.StringgetTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy)Extract text from a specified page using an extraction strategy.static java.lang.StringgetTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy, java.util.Map<java.lang.String,ContentOperator> additionalContentOperators)Extract text from a specified page using an extraction strategy.
-
-
-
Method Detail
-
getTextFromPage
public static java.lang.String getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy, java.util.Map<java.lang.String,ContentOperator> additionalContentOperators) throws java.io.IOException
Extract text from a specified page using an extraction strategy. Also allows registration of custom ContentOperators- Parameters:
reader- the reader to extract text frompageNumber- the page to extract text fromstrategy- the strategy to use for extracting textadditionalContentOperators- an optional map of custom ContentOperators for rendering instructions- Returns:
- the extracted text
- Throws:
java.io.IOException- if any operation fails while reading from the provided PdfReader
-
getTextFromPage
public static java.lang.String getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy) throws java.io.IOException
Extract text from a specified page using an extraction strategy.- Parameters:
reader- the reader to extract text frompageNumber- the page to extract text fromstrategy- the strategy to use for extracting text- Returns:
- the extracted text
- Throws:
java.io.IOException- if any operation fails while reading from the provided PdfReader- Since:
- 5.0.2
-
getTextFromPage
public static java.lang.String getTextFromPage(PdfReader reader, int pageNumber) throws java.io.IOException
Extract text from a specified page using the default strategy.Note: the default strategy is subject to change. If using a specific strategy is important, use
getTextFromPage(PdfReader, int, TextExtractionStrategy)- Parameters:
reader- the reader to extract text frompageNumber- the page to extract text from- Returns:
- the extracted text
- Throws:
java.io.IOException- if any operation fails while reading from the provided PdfReader- Since:
- 5.0.2
-
-