Class PreflightParser
java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.COSParser
org.apache.pdfbox.pdfparser.PDFParser
org.apache.pdfbox.preflight.parser.PreflightParser
- All Implemented Interfaces:
ICOSParser
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate PreflightConfigurationprivate static final CharsetDefine a one byte encoding that hasn't specific encoding in UTF-8 charset.private Formatprivate PreflightDocumentprivate ValidationResultFields inherited from class org.apache.pdfbox.pdfparser.COSParser
EOF_MARKER, fileLen, initialParseDone, OBJ_MARKER, securityHandler, SYSPROP_EOFLOOKUPRANGE, xrefTrailerResolverFields inherited from class org.apache.pdfbox.pdfparser.BaseParser
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, N, O, R, S, source, STREAM_STRING, T -
Constructor Summary
ConstructorsConstructorDescriptionPreflightParser(File file) Constructor.PreflightParser(String filename) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionprivate voidAdd a validation error to the ValidationResult.private voidcheckEndstreamKeyWord(COSDictionary dic, long startOffset) 'endstream' must be preceded by an EOLprivate voidCheck that the PDF header match rules of the PDF/A specification.private long'stream' must be followed by <CR><LF> or only <LF>protected PDDocumentCreate the resulting document.protected voidThe initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects.protected intlastIndexOf(char[] pattern, byte[] buf, int endOff) Searches last appearance of pattern within buffer.private booleanparse()This will parse the stream and populate the PDDocument object.Parse the given file and check if it is a confirming file according to the given format.parse(Format format, PreflightConfiguration config) Parse the given file and check if it is a confirming file according to the given format.protected COSArrayThis will parse a PDF array object.protected COSNameThis will parse a PDF name from the stream.protected COSStreamWraps theCOSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary)to check rules on 'stream' and 'endstream' keywords.protected COSStringCheck that the hexa string contains only an even number of Hexadecimal characters.protected COSBaseCallBaseParser.parseDirObject()check limit range for Float, Integer and number of Dictionary entries.private COSBaseparseFileObject(Long offsetOrObjstmObNr, COSObjectKey objKey) protected COSBaseparseObjectDynamically(COSObjectKey objKey, boolean requireExistingNotCompressedObj) Parse the object for the given object key.protected booleanparseXrefTable(long startByteOffset) Same method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so onprotected booleanIndicates whether the xref trailer resolver should be reset or not.static ValidationResultLoad and validate the given file.Methods inherited from class org.apache.pdfbox.pdfparser.COSParser
checkPages, createRandomAccessReadView, dereferenceCOSObject, getAccessPermission, getEncryption, isLenient, isString, parseFDFHeader, parseObjectStreamObject, parsePDFHeader, prepareDecryption, retrieveTrailer, setEOFLookupRange, setLenientMethods inherited from class org.apache.pdfbox.pdfparser.BaseParser
getObjectKey, isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOF, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseCOSDictionary, readExpectedChar, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipSpaces, skipWhiteSpaces
-
Field Details
-
ENCODING
Define a one byte encoding that hasn't specific encoding in UTF-8 charset. Avoid unexpected error when the encoding is Cp5816 -
format
-
config
-
preflightDocument
-
validationResult
-
-
Constructor Details
-
PreflightParser
Constructor.- Parameters:
file-- Throws:
IOException- if there is a reading error.
-
PreflightParser
Constructor.- Parameters:
filename-- Throws:
IOException- if there is a reading error.
-
-
Method Details
-
addValidationError
Add a validation error to the ValidationResult.- Parameters:
error- the validation error to be added
-
parse
Description copied from class:PDFParserThis will parse the stream and populate the PDDocument object. This will close the keystore stream when it is done parsing. Lenient mode is active by default.- Overrides:
parsein classPDFParser- Returns:
- the populated PDDocument
- Throws:
IOException- If there is an error reading from the stream or corrupt data is found.
-
parse
Parse the given file and check if it is a confirming file according to the given format.- Parameters:
format- format that the document should follow (defaultFormat.PDF_A1B)- Throws:
IOException
-
parse
Parse the given file and check if it is a confirming file according to the given format.- Parameters:
format- format that the document should follow (defaultFormat.PDF_A1B)config- Configuration bean that will be used by the PreflightDocument. If null the format is used to determine the default configuration.- Throws:
IOException
-
createDocument
Description copied from class:PDFParserCreate the resulting document. Maybe overwritten if the parser uses another class as document.- Overrides:
createDocumentin classPDFParser- Returns:
- the resulting document
- Throws:
IOException- if the method is called before parsing the document
-
initialParse
Description copied from class:PDFParserThe initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.- Overrides:
initialParsein classPDFParser- Throws:
IOException- If something went wrong.
-
resetTrailerResolver
protected boolean resetTrailerResolver()Description copied from class:COSParserIndicates whether the xref trailer resolver should be reset or not. Should be overwritten if the xref trailer resolver is needed after the initial parsing.- Overrides:
resetTrailerResolverin classCOSParser- Returns:
- true if the xref trailer resolver should be reset
-
checkPdfHeader
private void checkPdfHeader()Check that the PDF header match rules of the PDF/A specification. First line (offset 0) must be a comment with the PDF version (version 1.0 isn't conform to the PDF/A specification) Second line is a comment with at least 4 bytes greater than 0x80 -
parseXrefTable
Same method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so on- Overrides:
parseXrefTablein classCOSParser- Parameters:
startByteOffset- the offset to start at- Returns:
- false on parsing error
- Throws:
IOException- If an IO error occurs.
-
parseCOSStream
Wraps theCOSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary)to check rules on 'stream' and 'endstream' keywords.checkStreamKeyWord()andcheckEndstreamKeyWord(org.apache.pdfbox.cos.COSDictionary, long)- Overrides:
parseCOSStreamin classCOSParser- Parameters:
dic- dictionary that goes with this stream.- Returns:
- parsed pdf stream.
- Throws:
IOException- if an error occurred reading the stream, like problems with reading length attribute, stream does not end with 'endstream' after data read, stream too short etc.
-
checkStreamKeyWord
'stream' must be followed by <CR><LF> or only <LF>- Throws:
IOException
-
checkEndstreamKeyWord
'endstream' must be preceded by an EOL- Throws:
IOException
-
nextIsEOL
- Throws:
IOException
-
parseCOSArray
Description copied from class:BaseParserThis will parse a PDF array object.- Overrides:
parseCOSArrayin classBaseParser- Returns:
- The parsed PDF array.
- Throws:
IOException- If there is an error parsing the stream.
-
parseCOSName
Description copied from class:BaseParserThis will parse a PDF name from the stream.- Overrides:
parseCOSNamein classBaseParser- Returns:
- The parsed PDF name.
- Throws:
IOException- If there is an error reading from the stream.
-
parseCOSString
Check that the hexa string contains only an even number of Hexadecimal characters. Once it is done, reset the offset at the beginning of the string and callBaseParser.parseCOSString()- Overrides:
parseCOSStringin classBaseParser- Returns:
- The parsed PDF string.
- Throws:
IOException- If there is an error reading from the stream.
-
parseDirObject
CallBaseParser.parseDirObject()check limit range for Float, Integer and number of Dictionary entries.- Overrides:
parseDirObjectin classBaseParser- Returns:
- The parsed object.
- Throws:
IOException- if there is an error during parsing.
-
parseObjectDynamically
protected COSBase parseObjectDynamically(COSObjectKey objKey, boolean requireExistingNotCompressedObj) throws IOException Description copied from class:COSParserParse the object for the given object key.- Overrides:
parseObjectDynamicallyin classCOSParser- Parameters:
objKey- key of object to be parsedrequireExistingNotCompressedObj- iftruethe object to be parsed must be defined in xref (comment: null objects may be missing from xref) and it must not be a compressed object within object stream (this is used to circumvent being stuck in a loop in a malicious PDF)- Returns:
- the parsed object (which is also added to document object)
- Throws:
IOException- If an IO error occurs.
-
parseFileObject
- Throws:
IOException
-
lastIndexOf
protected int lastIndexOf(char[] pattern, byte[] buf, int endOff) Description copied from class:COSParserSearches last appearance of pattern within buffer. Lookup before _lastOff and goes back until 0.- Overrides:
lastIndexOfin classCOSParser- Parameters:
pattern- pattern to search forbuf- buffer to search pattern inendOff- offset (exclusive) where lookup starts at- Returns:
- start offset of pattern within buffer or
-1if pattern could not be found
-
validate
Load and validate the given file. Returns the validation result and closes the read pdf.- Parameters:
file- thew file to be read and validated- Returns:
- the validation result
- Throws:
IOException- in case of a file reading or parsing error
-