Class PreflightParser

All Implemented Interfaces:
ICOSParser

public class PreflightParser extends PDFParser
  • Field Details

    • ENCODING

      private static final Charset ENCODING
      Define a one byte encoding that hasn't specific encoding in UTF-8 charset. Avoid unexpected error when the encoding is Cp5816
    • format

      private Format format
    • config

      private PreflightConfiguration config
    • preflightDocument

      private PreflightDocument preflightDocument
    • validationResult

      private ValidationResult validationResult
  • Constructor Details

    • PreflightParser

      public PreflightParser(File file) throws IOException
      Constructor.
      Parameters:
      file -
      Throws:
      IOException - if there is a reading error.
    • PreflightParser

      public PreflightParser(RandomAccessRead rar) throws IOException
      Constructor.
      Parameters:
      rar - input source
      Throws:
      IOException - if there is a reading error.
    • PreflightParser

      public PreflightParser(String filename) throws IOException
      Constructor.
      Parameters:
      filename -
      Throws:
      IOException - if there is a reading error.
  • Method Details

    • addValidationError

      private void addValidationError(ValidationResult.ValidationError error)
      Add a validation error to the ValidationResult.
      Parameters:
      error - the validation error to be added
    • parse

      public PDDocument parse() throws IOException
      Description copied from class: PDFParser
      This will parse the stream and populate the PDDocument object. This will close the keystore stream when it is done parsing. Lenient mode is active by default.
      Overrides:
      parse in class PDFParser
      Returns:
      the populated PDDocument
      Throws:
      IOException - If there is an error reading from the stream or corrupt data is found.
    • parse

      public PDDocument parse(Format format) throws IOException
      Parse the given file and check if it is a confirming file according to the given format.
      Parameters:
      format - format that the document should follow (default Format.PDF_A1B)
      Returns:
      the parsed document.
      Throws:
      IOException
    • parse

      public PDDocument parse(Format format, PreflightConfiguration config) throws IOException
      Parse the given file and check if it is a confirming file according to the given format.
      Parameters:
      format - format that the document should follow (default Format.PDF_A1B)
      config - Configuration bean that will be used by the PreflightDocument. If null the format is used to determine the default configuration.
      Returns:
      the parsed document.
      Throws:
      IOException
    • createDocument

      protected PDDocument createDocument() throws IOException
      Description copied from class: PDFParser
      Create the resulting document. Maybe overwritten if the parser uses another class as document.
      Overrides:
      createDocument in class PDFParser
      Returns:
      the resulting document
      Throws:
      IOException - if the method is called before parsing the document
    • initialParse

      protected void initialParse() throws IOException
      Description copied from class: PDFParser
      The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.
      Overrides:
      initialParse in class PDFParser
      Throws:
      IOException - If something went wrong.
    • resetTrailerResolver

      protected boolean resetTrailerResolver()
      Description copied from class: COSParser
      Indicates whether the xref trailer resolver should be reset or not. Should be overwritten if the xref trailer resolver is needed after the initial parsing.
      Overrides:
      resetTrailerResolver in class COSParser
      Returns:
      true if the xref trailer resolver should be reset
    • checkPdfHeader

      private void checkPdfHeader()
      Check that the PDF header match rules of the PDF/A specification. First line (offset 0) must be a comment with the PDF version (version 1.0 isn't conform to the PDF/A specification) Second line is a comment with at least 4 bytes greater than 0x80
    • parseXrefTable

      protected boolean parseXrefTable(long startByteOffset) throws IOException
      Same method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so on
      Overrides:
      parseXrefTable in class COSParser
      Parameters:
      startByteOffset - the offset to start at
      Returns:
      false on parsing error
      Throws:
      IOException - If an IO error occurs.
    • parseCOSStream

      protected COSStream parseCOSStream(COSDictionary dic) throws IOException
      Overrides:
      parseCOSStream in class COSParser
      Parameters:
      dic - dictionary that goes with this stream.
      Returns:
      parsed pdf stream.
      Throws:
      IOException - if an error occurred reading the stream, like problems with reading length attribute, stream does not end with 'endstream' after data read, stream too short etc.
    • checkStreamKeyWord

      private long checkStreamKeyWord() throws IOException
      'stream' must be followed by <CR><LF> or only <LF>
      Throws:
      IOException
    • checkEndstreamKeyWord

      private void checkEndstreamKeyWord(COSDictionary dic, long startOffset) throws IOException
      'endstream' must be preceded by an EOL
      Throws:
      IOException
    • nextIsEOL

      private boolean nextIsEOL() throws IOException
      Throws:
      IOException
    • parseCOSArray

      protected COSArray parseCOSArray() throws IOException
      Description copied from class: BaseParser
      This will parse a PDF array object.
      Overrides:
      parseCOSArray in class BaseParser
      Returns:
      The parsed PDF array.
      Throws:
      IOException - If there is an error parsing the stream.
    • parseCOSName

      protected COSName parseCOSName() throws IOException
      Description copied from class: BaseParser
      This will parse a PDF name from the stream.
      Overrides:
      parseCOSName in class BaseParser
      Returns:
      The parsed PDF name.
      Throws:
      IOException - If there is an error reading from the stream.
    • parseCOSString

      protected COSString parseCOSString() throws IOException
      Check that the hexa string contains only an even number of Hexadecimal characters. Once it is done, reset the offset at the beginning of the string and call BaseParser.parseCOSString()
      Overrides:
      parseCOSString in class BaseParser
      Returns:
      The parsed PDF string.
      Throws:
      IOException - If there is an error reading from the stream.
    • parseDirObject

      protected COSBase parseDirObject() throws IOException
      Call BaseParser.parseDirObject() check limit range for Float, Integer and number of Dictionary entries.
      Overrides:
      parseDirObject in class BaseParser
      Returns:
      The parsed object.
      Throws:
      IOException - if there is an error during parsing.
    • parseObjectDynamically

      protected COSBase parseObjectDynamically(COSObjectKey objKey, boolean requireExistingNotCompressedObj) throws IOException
      Description copied from class: COSParser
      Parse the object for the given object key.
      Overrides:
      parseObjectDynamically in class COSParser
      Parameters:
      objKey - key of object to be parsed
      requireExistingNotCompressedObj - if true the object to be parsed must be defined in xref (comment: null objects may be missing from xref) and it must not be a compressed object within object stream (this is used to circumvent being stuck in a loop in a malicious PDF)
      Returns:
      the parsed object (which is also added to document object)
      Throws:
      IOException - If an IO error occurs.
    • parseFileObject

      private COSBase parseFileObject(Long offsetOrObjstmObNr, COSObjectKey objKey) throws IOException
      Throws:
      IOException
    • lastIndexOf

      protected int lastIndexOf(char[] pattern, byte[] buf, int endOff)
      Description copied from class: COSParser
      Searches last appearance of pattern within buffer. Lookup before _lastOff and goes back until 0.
      Overrides:
      lastIndexOf in class COSParser
      Parameters:
      pattern - pattern to search for
      buf - buffer to search pattern in
      endOff - offset (exclusive) where lookup starts at
      Returns:
      start offset of pattern within buffer or -1 if pattern could not be found
    • validate

      public static ValidationResult validate(File file) throws IOException
      Load and validate the given file. Returns the validation result and closes the read pdf.
      Parameters:
      file - thew file to be read and validated
      Returns:
      the validation result
      Throws:
      IOException - in case of a file reading or parsing error