Class COSDocument

java.lang.Object
org.apache.pdfbox.cos.COSBase
org.apache.pdfbox.cos.COSDocument
All Implemented Interfaces:
Closeable, AutoCloseable, COSObjectable

public class COSDocument extends COSBase implements Closeable
This is the in-memory representation of the PDF document. You need to call close() on this object when you are done using it!!
  • Field Details

    • LOG

      private static final org.apache.commons.logging.Log LOG
      Log instance.
    • version

      private float version
    • objectPool

      private final Map<COSObjectKey,COSObject> objectPool
      Maps ObjectKeys to a COSObject. Note that references to these objects are also stored in COSDictionary objects that map a name to a specific object.
    • xrefTable

      private final Map<COSObjectKey,Long> xrefTable
      Maps object and generation id to object byte offsets.
    • streams

      private final List<COSStream> streams
      List containing all streams which are created when creating a new pdf.
    • trailer

      private COSDictionary trailer
      Document trailer dictionary.
    • isDecrypted

      private boolean isDecrypted
      Signal that document is already decrypted.
    • startXref

      private long startXref
    • closed

      private boolean closed
    • isXRefStream

      private boolean isXRefStream
    • hasHybridXRef

      private boolean hasHybridXRef
    • streamCache

      private final RandomAccessStreamCache streamCache
    • highestXRefObjectNumber

      private long highestXRefObjectNumber
      Used for incremental saving, to avoid XRef object numbers from being reused.
    • parser

      private final ICOSParser parser
    • documentState

      private final COSDocumentState documentState
  • Constructor Details

    • COSDocument

      public COSDocument()
      Constructor. Uses main memory to buffer PDF streams.
    • COSDocument

      public COSDocument(ICOSParser parser)
      Constructor. Uses main memory to buffer PDF streams.
      Parameters:
      parser - Parser to be used to parse the document on demand
    • COSDocument

      public COSDocument(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction)
      Constructor that will use the provided function to create a stream cache for the storage of the PDF streams.
      Parameters:
      streamCacheCreateFunction - a function to create an instance of a stream cache
    • COSDocument

      public COSDocument(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction, ICOSParser parser)
      Constructor that will use the provided function to create a stream cache for the storage of the PDF streams.
      Parameters:
      streamCacheCreateFunction - a function to create an instance of a stream cache
      parser - Parser to be used to parse the document on demand
  • Method Details

    • getStreamCache

      private RandomAccessStreamCache getStreamCache(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction)
    • createCOSStream

      public COSStream createCOSStream()
      Creates a new COSStream using the current configuration for scratch files.
      Returns:
      the new COSStream
    • createCOSStream

      public COSStream createCOSStream(COSDictionary dictionary, long startPosition, long streamLength) throws IOException
      Creates a new COSStream using the current configuration for scratch files. Not for public use. Only COSParser should call this method.
      Parameters:
      dictionary - the corresponding dictionary
      startPosition - the start position within the source
      streamLength - the stream length
      Returns:
      the new COSStream
      Throws:
      IOException - if the random access view can't be read
    • getLinearizedDictionary

      public COSDictionary getLinearizedDictionary()
      Get the dictionary containing the linearization information if the pdf is linearized.
      Returns:
      the dictionary containing the linearization information
    • getObjectsByType

      public List<COSObject> getObjectsByType(COSName type)
      This will get all dictionaries objects by type.
      Parameters:
      type - The type of the object.
      Returns:
      This will return all objects with the specified type.
    • getObjectsByType

      public List<COSObject> getObjectsByType(COSName type1, COSName type2)
      This will get all dictionaries objects by type.
      Parameters:
      type1 - The first possible type of the object, mandatory.
      type2 - The second possible type of the object, usually an abbreviation, optional.
      Returns:
      This will return all objects with the specified type(s).
    • getObjectsByType

      private List<COSObject> getObjectsByType(List<COSObjectKey> keys, COSName type1, COSName type2)
    • setVersion

      public void setVersion(float versionValue)
      This will set the header version of this PDF document.
      Parameters:
      versionValue - The version of the PDF document.
    • getVersion

      public float getVersion()
      This will get the version extracted from the header of this PDF document.
      Returns:
      The header version.
    • setDecrypted

      public void setDecrypted()
      Signals that the document is decrypted completely.
    • isDecrypted

      public boolean isDecrypted()
      Indicates if a encrypted pdf is already decrypted after parsing.
      Returns:
      true indicates that the pdf is decrypted.
    • isEncrypted

      public boolean isEncrypted()
      This will tell if this is an encrypted document.
      Returns:
      true If this document is encrypted.
    • getEncryptionDictionary

      public COSDictionary getEncryptionDictionary()
      This will get the encryption dictionary if the document is encrypted or null if the document is not encrypted.
      Returns:
      The encryption dictionary.
    • setEncryptionDictionary

      public void setEncryptionDictionary(COSDictionary encDictionary)
      This will set the encryption dictionary, this should only be called when encrypting the document.
      Parameters:
      encDictionary - The encryption dictionary.
    • getDocumentID

      public COSArray getDocumentID()
      This will get the document ID.
      Returns:
      The document id.
    • setDocumentID

      public void setDocumentID(COSArray id)
      This will set the document ID.
      Parameters:
      id - The document id.
    • getTrailer

      public COSDictionary getTrailer()
      This will get the document trailer.
      Returns:
      the document trailer dict
    • setTrailer

      public void setTrailer(COSDictionary newTrailer)
      // MIT added, maybe this should not be supported as trailer is a persistence construct. This will set the document trailer.
      Parameters:
      newTrailer - the document trailer dictionary
    • getHighestXRefObjectNumber

      public long getHighestXRefObjectNumber()
      Internal PDFBox use only. Get the object number of the highest XRef stream. This is needed to avoid reusing such a number in incremental saving.
      Returns:
      The object number of the highest XRef stream, or 0 if there was no XRef stream.
    • setHighestXRefObjectNumber

      public void setHighestXRefObjectNumber(long highestXRefObjectNumber)
      Internal PDFBox use only. Sets the object number of the highest XRef stream. This is needed to avoid reusing such a number in incremental saving.
      Parameters:
      highestXRefObjectNumber - The object number of the highest XRef stream.
    • accept

      public void accept(ICOSVisitor visitor) throws IOException
      visitor pattern double dispatch method.
      Specified by:
      accept in class COSBase
      Parameters:
      visitor - The object to notify when visiting this object.
      Throws:
      IOException - If an error occurs while visiting this object.
    • close

      public void close() throws IOException
      This will close all storage and delete the tmp files.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException - If there is an error close resources.
    • isClosed

      public boolean isClosed()
      Returns true if this document has been closed.
      Returns:
      true if the document is already closed, false otherwise
    • getObjectFromPool

      public COSObject getObjectFromPool(COSObjectKey key)
      This will get an object from the pool.
      Parameters:
      key - The object key.
      Returns:
      The object in the pool or a new one if it has not been parsed yet.
    • addXRefTable

      public void addXRefTable(Map<COSObjectKey,Long> xrefTableValues)
      Populate XRef HashMap with given values. Each entry maps ObjectKeys to byte offsets in the file.
      Parameters:
      xrefTableValues - xref table entries to be added
    • getXrefTable

      public Map<COSObjectKey,Long> getXrefTable()
      Returns the xrefTable which is a mapping of ObjectKeys to byte offsets in the file.
      Returns:
      mapping of ObjectsKeys to byte offsets
    • setStartXref

      public void setStartXref(long startXrefValue)
      This method set the startxref value of the document. This will only be needed for incremental updates.
      Parameters:
      startXrefValue - the value for startXref
    • getStartXref

      public long getStartXref()
      Return the startXref Position of the parsed document. This will only be needed for incremental updates.
      Returns:
      a long with the old position of the startxref
    • isXRefStream

      public boolean isXRefStream()
      Determines if the trailer is a XRef stream or not.
      Returns:
      true if the trailer is a XRef stream
    • setIsXRefStream

      public void setIsXRefStream(boolean isXRefStreamValue)
      Sets isXRefStream to the given value. You need to take care that the version of your PDF is 1.5 or higher.
      Parameters:
      isXRefStreamValue - the new value for isXRefStream
    • hasHybridXRef

      public boolean hasHybridXRef()
      Determines if the pdf has hybrid cross references, both plain tables and streams.
      Returns:
      true if the pdf has hybrid cross references
    • setHasHybridXRef

      public void setHasHybridXRef()
      Marks the pdf as document using hybrid cross references.
    • getDocumentState

      public COSDocumentState getDocumentState()
      Returns the COSDocumentState of this COSDocument.
      Returns:
      The COSDocumentState of this COSDocument.
      See Also: