Class PdfReader

java.lang.Object
com.aowagie.text.pdf.PdfReader
All Implemented Interfaces:
PdfViewerPreferences

public class PdfReader extends Object implements PdfViewerPreferences
Reads a PDF document.
Author:
Paulo Soares (psoares@consiste.pt), Kazuya Ujihara
  • Field Details

  • Constructor Details

    • PdfReader

      protected PdfReader()
    • PdfReader

      public PdfReader(String filename) throws IOException
      Reads and parses a PDF document.
      Parameters:
      filename - the file name of the document
      Throws:
      IOException - on error
    • PdfReader

      public PdfReader(byte[] pdfIn) throws IOException
      Reads and parses a PDF document.
      Parameters:
      pdfIn - the byte array with the document
      Throws:
      IOException - on error
    • PdfReader

      public PdfReader(byte[] pdfIn, byte[] ownerPassword) throws IOException
      Reads and parses a PDF document.
      Parameters:
      pdfIn - the byte array with the document
      ownerPassword - the password to read the document
      Throws:
      IOException - on error
    • PdfReader

      public PdfReader(String filename, Certificate certificate, Key certificateKey, String certificateKeyProvider) throws IOException
      Reads and parses a PDF document.
      Parameters:
      filename - the file name of the document
      certificate - the certificate to read the document
      certificateKey - the private key of the certificate
      certificateKeyProvider - the security provider for certificateKey
      Throws:
      IOException - on error
    • PdfReader

      public PdfReader(InputStream is) throws IOException
      Reads and parses a PDF document.
      Parameters:
      is - the InputStream containing the document. The stream is read to the end but is not closed
      Throws:
      IOException - on error
  • Method Details

    • getSafeFile

      public RandomAccessFileOrArray getSafeFile()
      Gets a new file instance of the original PDF document.
      Returns:
      a new file instance of the original PDF document
    • getPdfReaderInstance

      protected com.aowagie.text.pdf.PdfReaderInstance getPdfReaderInstance(PdfWriter writer)
    • getNumberOfPages

      public int getNumberOfPages()
      Gets the number of pages in the document.
      Returns:
      the number of pages in the document
    • getCatalog

      public PdfDictionary getCatalog()
      Returns the document's catalog. This dictionary is not a copy, any changes will be reflected in the catalog.
      Returns:
      the document's catalog
    • getAcroForm

      public PRAcroForm getAcroForm()
      Returns the document's acroform, if it has one.
      Returns:
      the document's acroform
    • getPageRotation

      public int getPageRotation(int index)
      Gets the page rotation. This value can be 0, 90, 180 or 270.
      Parameters:
      index - the page number. The first page is 1
      Returns:
      the page rotation
    • getPageRotation

      public int getPageRotation(PdfDictionary page)
    • getPageSizeWithRotation

      public Rectangle getPageSizeWithRotation(int index)
      Gets the page size, taking rotation into account. This is a Rectangle with the value of the /MediaBox and the /Rotate key.
      Parameters:
      index - the page number. The first page is 1
      Returns:
      a Rectangle.
    • getPageSize

      public Rectangle getPageSize(int index)
      Gets the page size without taking rotation into account. This is the value of the /MediaBox key.
      Parameters:
      index - the page number. The first page is 1
      Returns:
      the page size
    • getInfo

      public HashMap getInfo()
      Returns the content of the document information dictionary as a HashMap of String.
      Returns:
      content of the document information dictionary
    • readPdf

      protected void readPdf() throws IOException
      Throws:
      IOException
    • getPdfObjectRelease

      public static PdfObject getPdfObjectRelease(PdfObject obj)
      Parameters:
      obj - object to release
      Returns:
      a PdfObject
    • getPdfObject

      public static PdfObject getPdfObject(PdfObject obj)
      Reads a PdfObject resolving an indirect reference if needed.
      Parameters:
      obj - the PdfObject to read
      Returns:
      the resolved PdfObject
    • getPdfObject

      public PdfObject getPdfObject(int idx)
      Parameters:
      idx - index to get
      Returns:
      aPdfObject returns a PdfObject
    • addPdfObject

      public PRIndirectReference addPdfObject(PdfObject obj)
      Parameters:
      obj - object to add
      Returns:
      an indirect reference
    • readPages

      protected void readPages() throws IOException
      Throws:
      IOException
    • readDocObj

      protected void readDocObj() throws IOException
      Throws:
      IOException
    • rebuildXref

      protected void rebuildXref() throws IOException
      Throws:
      IOException
    • FlateDecode

      public static byte[] FlateDecode(byte[] in, boolean strict)
      A helper to FlateDecode.
      Parameters:
      in - the input data
      strict - true to read a correct stream. false to try to read a corrupted stream
      Returns:
      the decoded data
    • isRebuilt

      public boolean isRebuilt()
      Checks if the document had errors and was rebuilt.
      Returns:
      true if rebuilt.
    • getPageN

      public PdfDictionary getPageN(int pageNum)
      Gets the dictionary that represents a page.
      Parameters:
      pageNum - the page number. 1 is the first
      Returns:
      the page dictionary
    • getPageNRelease

      public PdfDictionary getPageNRelease(int pageNum)
      Parameters:
      pageNum - number of page
      Returns:
      a Dictionary object
    • releasePage

      public void releasePage(int pageNum)
      Parameters:
      pageNum - number of page
    • resetReleasePage

      public void resetReleasePage()
    • getPageOrigRef

      public PRIndirectReference getPageOrigRef(int pageNum)
      Gets the page reference to this page.
      Parameters:
      pageNum - the page number. 1 is the first
      Returns:
      the page reference
    • getPageContent

      public byte[] getPageContent(int pageNum, RandomAccessFileOrArray file) throws IOException
      Gets the contents of the page.
      Parameters:
      pageNum - the page number. 1 is the first
      file - the location of the PDF document
      Returns:
      the content
      Throws:
      IOException - on error
    • killXref

      protected void killXref(PdfObject obj)
    • getStreamBytes

      public static byte[] getStreamBytes(PRStream stream) throws IOException
      Get the content from a stream applying the required filters.
      Parameters:
      stream - the stream
      Returns:
      the stream content
      Throws:
      IOException - on error
    • isTampered

      public boolean isTampered()
      Checks if the document was changed.
      Returns:
      true if the document was changed, false otherwise
    • setTampered

      public void setTampered(boolean tampered)
      Sets the tampered state. A tampered PdfReader cannot be reused in PdfStamper.
      Parameters:
      tampered - the tampered state
    • getMetadata

      public byte[] getMetadata() throws IOException
      Gets the XML metadata.
      Returns:
      the XML metadata
      Throws:
      IOException - on error
    • getLastXref

      public int getLastXref()
      Gets the byte address of the last xref table.
      Returns:
      the byte address of the last xref table
    • getXrefSize

      public int getXrefSize()
      Gets the number of xref objects.
      Returns:
      the number of xref objects
    • getEofPos

      public int getEofPos()
      Gets the byte address of the %%EOF marker.
      Returns:
      the byte address of the %%EOF marker
    • getPdfVersion

      public char getPdfVersion()
      Gets the PDF version. Only the last version char is returned. For example version 1.4 is returned as '4'.
      Returns:
      the PDF version
    • isEncrypted

      public boolean isEncrypted()
      Returns true if the PDF is encrypted.
      Returns:
      true if the PDF is encrypted
    • getPermissions

      public int getPermissions()
      Gets the encryption permissions. It can be used directly in PdfWriter.setEncryption().
      Returns:
      the encryption permissions
    • getTrailer

      public PdfDictionary getTrailer()
      Gets the trailer dictionary
      Returns:
      the trailer dictionary
    • getNamedDestination

      public HashMap getNamedDestination()
      Gets all the named destinations as an HashMap. The key is the name and the value is the destinations array.
      Returns:
      gets all the named destinations
    • getNamedDestinationFromNames

      public HashMap getNamedDestinationFromNames()
      Gets the named destinations from the /Dests key in the catalog as an HashMap. The key is the name and the value is the destinations array.
      Returns:
      gets the named destinations
    • getNamedDestinationFromStrings

      public HashMap getNamedDestinationFromStrings()
      Gets the named destinations from the /Names key in the catalog as an HashMap. The key is the name and the value is the destinations array.
      Returns:
      gets the named destinations
    • close

      public void close()
      Closes the reader
    • getAcroFields

      public AcroFields getAcroFields()
      Gets a read-only version of AcroFields.
      Returns:
      a read-only version of AcroFields
    • getJavaScript

      public String getJavaScript() throws IOException
      Gets the global document JavaScript.
      Returns:
      the global document JavaScript
      Throws:
      IOException - on error
    • setViewerPreferences

      public void setViewerPreferences(int preferences)
      Sets the viewer preferences as the sum of several constants.
      Specified by:
      setViewerPreferences in interface PdfViewerPreferences
      Parameters:
      preferences - the viewer preferences
      See Also:
    • addViewerPreference

      public void addViewerPreference(PdfName key, PdfObject value)
      Adds a viewer preference
      Specified by:
      addViewerPreference in interface PdfViewerPreferences
      Parameters:
      key - a key for a viewer preference
      value - a value for the viewer preference
      See Also:
    • getSimpleViewerPreferences

      public int getSimpleViewerPreferences()
      Returns a bitset representing the PageMode and PageLayout viewer preferences. Doesn't return any information about the ViewerPreferences dictionary.
      Returns:
      an int that contains the Viewer Preferences.
    • isAppendable

      public boolean isAppendable()
      Getter for property appendable.
      Returns:
      Value of property appendable.
    • setAppendable

      public void setAppendable(boolean appendable)
      Setter for property appendable.
      Parameters:
      appendable - New value of property appendable.
    • isNewXrefType

      public boolean isNewXrefType()
      Getter for property newXrefType.
      Returns:
      Value of property newXrefType.
    • getFileLength

      public int getFileLength()
      Getter for property fileLength.
      Returns:
      Value of property fileLength.
    • isHybridXref

      public boolean isHybridXref()
      Getter for property hybridXref.
      Returns:
      Value of property hybridXref.
    • removeUsageRights

      public void removeUsageRights()
      Removes any usage rights that this PDF may have. Only Adobe can grant usage rights and any PDF modification with iText will invalidate them. Invalidated usage rights may confuse Acrobat and it's advisable to remove them altogether.
    • getCertificationLevel

      public int getCertificationLevel()
      Gets the certification level for this document. The return values can be PdfSignatureAppearance.NOT_CERTIFIED, PdfSignatureAppearance.CERTIFIED_NO_CHANGES_ALLOWED, PdfSignatureAppearance.CERTIFIED_FORM_FILLING and PdfSignatureAppearance.CERTIFIED_FORM_FILLING_AND_ANNOTATIONS.

      No signature validation is made, use the methods available for that in AcroFields.

      Returns:
      gets the certification level for this document
    • isOpenedWithFullPermissions

      public final boolean isOpenedWithFullPermissions()
      Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply. If the document is not encrypted it will return true.
      Returns:
      true if the document was opened with the owner password or if it's not encrypted, false if the document was opened with the user password
    • getCryptoMode

      public int getCryptoMode()
    • isMetadataEncrypted

      public boolean isMetadataEncrypted()