Class PDFParser

All Implemented Interfaces:
ICOSParser
Direct Known Subclasses:
PreflightParser

public class PDFParser extends COSParser
  • Field Details

    • LOG

      private static final org.apache.commons.logging.Log LOG
  • Constructor Details

    • PDFParser

      public PDFParser(RandomAccessRead source) throws IOException
      Constructor. Unrestricted main memory will be used for buffering PDF streams.
      Parameters:
      source - source representing the pdf.
      Throws:
      IOException - If something went wrong.
    • PDFParser

      public PDFParser(RandomAccessRead source, String decryptionPassword) throws IOException
      Constructor. Unrestricted main memory will be used for buffering PDF streams.
      Parameters:
      source - input representing the pdf.
      decryptionPassword - password to be used for decryption.
      Throws:
      IOException - If something went wrong.
    • PDFParser

      public PDFParser(RandomAccessRead source, String decryptionPassword, InputStream keyStore, String alias) throws IOException
      Constructor. Unrestricted main memory will be used for buffering PDF streams.
      Parameters:
      source - input representing the pdf.
      decryptionPassword - password to be used for decryption.
      keyStore - key store to be used for decryption when using public key security
      alias - alias to be used for decryption when using public key security
      Throws:
      IOException - If something went wrong.
    • PDFParser

      public PDFParser(RandomAccessRead source, String decryptionPassword, InputStream keyStore, String alias, RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction) throws IOException
      Constructor.
      Parameters:
      source - input representing the pdf.
      decryptionPassword - password to be used for decryption.
      keyStore - key store to be used for decryption when using public key security
      alias - alias to be used for decryption when using public key security
      streamCacheCreateFunction - a function to create an instance of the stream cache
      Throws:
      IOException - If something went wrong.
  • Method Details

    • initialParse

      protected void initialParse() throws IOException
      The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.
      Throws:
      InvalidPasswordException - If the password is incorrect.
      IOException - If something went wrong.
    • parse

      public PDDocument parse() throws IOException
      This will parse the stream and populate the PDDocument object. This will close the keystore stream when it is done parsing. Lenient mode is active by default.
      Returns:
      the populated PDDocument
      Throws:
      InvalidPasswordException - If the password is incorrect.
      IOException - If there is an error reading from the stream or corrupt data is found.
    • parse

      public PDDocument parse(boolean lenient) throws IOException
      This will parse the stream and populate the PDDocument object. This will close the keystore stream when it is done parsing.
      Parameters:
      lenient - activate leniency if set to true
      Returns:
      the populated PDDocument
      Throws:
      InvalidPasswordException - If the password is incorrect.
      IOException - If there is an error reading from the stream or corrupt data is found.
    • createDocument

      protected PDDocument createDocument() throws IOException
      Create the resulting document. Maybe overwritten if the parser uses another class as document.
      Returns:
      the resulting document
      Throws:
      IOException - if the method is called before parsing the document
    • load

      @Deprecated public static PDDocument load(File file) throws IOException
      Deprecated.
      Parses a PDF. Unrestricted main memory will be used for buffering PDF streams.
      Parameters:
      file - file to be loaded
      Returns:
      loaded document
      Throws:
      InvalidPasswordException - If the file required a non-empty password.
      IOException - in case of a file reading or parsing error
    • load

      @Deprecated public static PDDocument load(File file, String password) throws IOException
      Deprecated.
      Parses a PDF. Unrestricted main memory will be used for buffering PDF streams.
      Parameters:
      file - file to be loaded
      password - password to be used for decryption
      Returns:
      loaded document
      Throws:
      InvalidPasswordException - If the password is incorrect.
      IOException - in case of a file reading or parsing error