Package org.apache.pdfbox.pdfparser
Class PDFParser
java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.COSParser
org.apache.pdfbox.pdfparser.PDFParser
- All Implemented Interfaces:
ICOSParser
- Direct Known Subclasses:
PreflightParser
-
Field Summary
FieldsFields inherited from class org.apache.pdfbox.pdfparser.COSParser
EOF_MARKER, fileLen, initialParseDone, OBJ_MARKER, securityHandler, SYSPROP_EOFLOOKUPRANGE, xrefTrailerResolverFields inherited from class org.apache.pdfbox.pdfparser.BaseParser
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, MAX_LENGTH_LONG, N, O, R, S, source, STREAM_STRING, T -
Constructor Summary
ConstructorsConstructorDescriptionPDFParser(RandomAccessRead source) Constructor.PDFParser(RandomAccessRead source, String decryptionPassword) Constructor.PDFParser(RandomAccessRead source, String decryptionPassword, InputStream keyStore, String alias) Constructor.PDFParser(RandomAccessRead source, String decryptionPassword, InputStream keyStore, String alias, RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionprotected PDDocumentCreate the resulting document.protected voidThe initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects.static PDDocumentDeprecated.static PDDocumentDeprecated.useLoader.loadPDF(File, String)insteadparse()This will parse the stream and populate the PDDocument object.parse(boolean lenient) This will parse the stream and populate the PDDocument object.Methods inherited from class org.apache.pdfbox.pdfparser.COSParser
checkPages, createRandomAccessReadView, dereferenceCOSObject, getAccessPermission, getEncryption, isLenient, isString, lastIndexOf, parseCOSStream, parseFDFHeader, parseObjectDynamically, parseObjectStreamObject, parsePDFHeader, parseXrefTable, prepareDecryption, resetTrailerResolver, retrieveTrailer, setEOFLookupRange, setLenientMethods inherited from class org.apache.pdfbox.pdfparser.BaseParser
getObjectKey, isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOF, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSString, parseDirObject, readExpectedChar, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipLinebreak, skipSpaces, skipWhiteSpaces
-
Field Details
-
LOG
private static final org.apache.commons.logging.Log LOG
-
-
Constructor Details
-
PDFParser
Constructor. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
source- source representing the pdf.- Throws:
IOException- If something went wrong.
-
PDFParser
Constructor. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
source- input representing the pdf.decryptionPassword- password to be used for decryption.- Throws:
IOException- If something went wrong.
-
PDFParser
public PDFParser(RandomAccessRead source, String decryptionPassword, InputStream keyStore, String alias) throws IOException Constructor. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
source- input representing the pdf.decryptionPassword- password to be used for decryption.keyStore- key store to be used for decryption when using public key securityalias- alias to be used for decryption when using public key security- Throws:
IOException- If something went wrong.
-
PDFParser
public PDFParser(RandomAccessRead source, String decryptionPassword, InputStream keyStore, String alias, RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction) throws IOException Constructor.- Parameters:
source- input representing the pdf.decryptionPassword- password to be used for decryption.keyStore- key store to be used for decryption when using public key securityalias- alias to be used for decryption when using public key securitystreamCacheCreateFunction- a function to create an instance of the stream cache- Throws:
IOException- If something went wrong.
-
-
Method Details
-
initialParse
The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.- Throws:
InvalidPasswordException- If the password is incorrect.IOException- If something went wrong.
-
parse
This will parse the stream and populate the PDDocument object. This will close the keystore stream when it is done parsing. Lenient mode is active by default.- Returns:
- the populated PDDocument
- Throws:
InvalidPasswordException- If the password is incorrect.IOException- If there is an error reading from the stream or corrupt data is found.
-
parse
This will parse the stream and populate the PDDocument object. This will close the keystore stream when it is done parsing.- Parameters:
lenient- activate leniency if set to true- Returns:
- the populated PDDocument
- Throws:
InvalidPasswordException- If the password is incorrect.IOException- If there is an error reading from the stream or corrupt data is found.
-
createDocument
Create the resulting document. Maybe overwritten if the parser uses another class as document.- Returns:
- the resulting document
- Throws:
IOException- if the method is called before parsing the document
-
load
Deprecated.useLoader.loadPDF(File)insteadParses a PDF. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
file- file to be loaded- Returns:
- loaded document
- Throws:
InvalidPasswordException- If the file required a non-empty password.IOException- in case of a file reading or parsing error
-
load
Deprecated.useLoader.loadPDF(File, String)insteadParses a PDF. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
file- file to be loadedpassword- password to be used for decryption- Returns:
- loaded document
- Throws:
InvalidPasswordException- If the password is incorrect.IOException- in case of a file reading or parsing error
-
Loader.loadPDF(File)instead