Package org.apache.pdfbox.cos
Class COSDocument
java.lang.Object
org.apache.pdfbox.cos.COSBase
org.apache.pdfbox.cos.COSDocument
- All Implemented Interfaces:
Closeable,AutoCloseable,COSObjectable
This is the in-memory representation of the PDF document. You need to call
close() on this object when you are done using it!!
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate booleanprivate final COSDocumentStateprivate booleanprivate longUsed for incremental saving, to avoid XRef object numbers from being reused.private booleanSignal that document is already decrypted.private booleanprivate static final org.apache.commons.logging.LogLog instance.private final Map<COSObjectKey, COSObject> Maps ObjectKeys to a COSObject.private final ICOSParserprivate longprivate final RandomAccessStreamCacheList containing all streams which are created when creating a new pdf.private COSDictionaryDocument trailer dictionary.private floatprivate final Map<COSObjectKey, Long> Maps object and generation id to object byte offsets. -
Constructor Summary
ConstructorsConstructorDescriptionConstructor.COSDocument(ICOSParser parser) Constructor.COSDocument(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction) Constructor that will use the provided function to create a stream cache for the storage of the PDF streams.COSDocument(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction, ICOSParser parser) Constructor that will use the provided function to create a stream cache for the storage of the PDF streams. -
Method Summary
Modifier and TypeMethodDescriptionvoidaccept(ICOSVisitor visitor) visitor pattern double dispatch method.voidaddXRefTable(Map<COSObjectKey, Long> xrefTableValues) Populate XRef HashMap with given values.voidclose()This will close all storage and delete the tmp files.Creates a new COSStream using the current configuration for scratch files.createCOSStream(COSDictionary dictionary, long startPosition, long streamLength) Creates a new COSStream using the current configuration for scratch files.This will get the document ID.Returns theCOSDocumentStateof thisCOSDocument.This will get the encryption dictionary if the document is encrypted or null if the document is not encrypted.longInternal PDFBox use only.Get the dictionary containing the linearization information if the pdf is linearized.This will get an object from the pool.getObjectsByType(List<COSObjectKey> keys, COSName type1, COSName type2) getObjectsByType(COSName type) This will get all dictionaries objects by type.getObjectsByType(COSName type1, COSName type2) This will get all dictionaries objects by type.longReturn the startXref Position of the parsed document.private RandomAccessStreamCachegetStreamCache(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction) This will get the document trailer.floatThis will get the version extracted from the header of this PDF document.Returns the xrefTable which is a mapping of ObjectKeys to byte offsets in the file.booleanDetermines if the pdf has hybrid cross references, both plain tables and streams.booleanisClosed()Returns true if this document has been closed.booleanIndicates if a encrypted pdf is already decrypted after parsing.booleanThis will tell if this is an encrypted document.booleanDetermines if the trailer is a XRef stream or not.voidSignals that the document is decrypted completely.voidThis will set the document ID.voidsetEncryptionDictionary(COSDictionary encDictionary) This will set the encryption dictionary, this should only be called when encrypting the document.voidMarks the pdf as document using hybrid cross references.voidsetHighestXRefObjectNumber(long highestXRefObjectNumber) Internal PDFBox use only.voidsetIsXRefStream(boolean isXRefStreamValue) Sets isXRefStream to the given value.voidsetStartXref(long startXrefValue) This method set the startxref value of the document.voidsetTrailer(COSDictionary newTrailer) // MIT added, maybe this should not be supported as trailer is a persistence construct.voidsetVersion(float versionValue) This will set the header version of this PDF document.
-
Field Details
-
LOG
private static final org.apache.commons.logging.Log LOGLog instance. -
version
private float version -
objectPool
Maps ObjectKeys to a COSObject. Note that references to these objects are also stored in COSDictionary objects that map a name to a specific object. -
xrefTable
Maps object and generation id to object byte offsets. -
streams
List containing all streams which are created when creating a new pdf. -
trailer
Document trailer dictionary. -
isDecrypted
private boolean isDecryptedSignal that document is already decrypted. -
startXref
private long startXref -
closed
private boolean closed -
isXRefStream
private boolean isXRefStream -
hasHybridXRef
private boolean hasHybridXRef -
streamCache
-
highestXRefObjectNumber
private long highestXRefObjectNumberUsed for incremental saving, to avoid XRef object numbers from being reused. -
parser
-
documentState
-
-
Constructor Details
-
COSDocument
public COSDocument()Constructor. Uses main memory to buffer PDF streams. -
COSDocument
Constructor. Uses main memory to buffer PDF streams.- Parameters:
parser- Parser to be used to parse the document on demand
-
COSDocument
Constructor that will use the provided function to create a stream cache for the storage of the PDF streams.- Parameters:
streamCacheCreateFunction- a function to create an instance of a stream cache
-
COSDocument
public COSDocument(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction, ICOSParser parser) Constructor that will use the provided function to create a stream cache for the storage of the PDF streams.- Parameters:
streamCacheCreateFunction- a function to create an instance of a stream cacheparser- Parser to be used to parse the document on demand
-
-
Method Details
-
getStreamCache
private RandomAccessStreamCache getStreamCache(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction) -
createCOSStream
Creates a new COSStream using the current configuration for scratch files.- Returns:
- the new COSStream
-
createCOSStream
public COSStream createCOSStream(COSDictionary dictionary, long startPosition, long streamLength) throws IOException Creates a new COSStream using the current configuration for scratch files. Not for public use. Only COSParser should call this method.- Parameters:
dictionary- the corresponding dictionarystartPosition- the start position within the sourcestreamLength- the stream length- Returns:
- the new COSStream
- Throws:
IOException- if the random access view can't be read
-
getLinearizedDictionary
Get the dictionary containing the linearization information if the pdf is linearized.- Returns:
- the dictionary containing the linearization information
-
getObjectsByType
This will get all dictionaries objects by type.- Parameters:
type- The type of the object.- Returns:
- This will return all objects with the specified type.
-
getObjectsByType
This will get all dictionaries objects by type.- Parameters:
type1- The first possible type of the object, mandatory.type2- The second possible type of the object, usually an abbreviation, optional.- Returns:
- This will return all objects with the specified type(s).
-
getObjectsByType
-
setVersion
public void setVersion(float versionValue) This will set the header version of this PDF document.- Parameters:
versionValue- The version of the PDF document.
-
getVersion
public float getVersion()This will get the version extracted from the header of this PDF document.- Returns:
- The header version.
-
setDecrypted
public void setDecrypted()Signals that the document is decrypted completely. -
isDecrypted
public boolean isDecrypted()Indicates if a encrypted pdf is already decrypted after parsing.- Returns:
- true indicates that the pdf is decrypted.
-
isEncrypted
public boolean isEncrypted()This will tell if this is an encrypted document.- Returns:
- true If this document is encrypted.
-
getEncryptionDictionary
This will get the encryption dictionary if the document is encrypted or null if the document is not encrypted.- Returns:
- The encryption dictionary.
-
setEncryptionDictionary
This will set the encryption dictionary, this should only be called when encrypting the document.- Parameters:
encDictionary- The encryption dictionary.
-
getDocumentID
This will get the document ID.- Returns:
- The document id.
-
setDocumentID
This will set the document ID. This should be an array of two strings. This method cannot be used to remove the document id by passing null or an empty array; it will be recreated. Only the first existing string is used when writing, the second one is always recreated. If you don't want this, you'll have to modify theCOSWriterclass, look forCOSName.ID.- Parameters:
id- The document id.
-
getTrailer
This will get the document trailer.- Returns:
- the document trailer dict
-
setTrailer
// MIT added, maybe this should not be supported as trailer is a persistence construct. This will set the document trailer.- Parameters:
newTrailer- the document trailer dictionary
-
getHighestXRefObjectNumber
public long getHighestXRefObjectNumber()Internal PDFBox use only. Get the object number of the highest XRef stream. This is needed to avoid reusing such a number in incremental saving.- Returns:
- The object number of the highest XRef stream, or 0 if there was no XRef stream.
-
setHighestXRefObjectNumber
public void setHighestXRefObjectNumber(long highestXRefObjectNumber) Internal PDFBox use only. Sets the object number of the highest XRef stream. This is needed to avoid reusing such a number in incremental saving.- Parameters:
highestXRefObjectNumber- The object number of the highest XRef stream.
-
accept
visitor pattern double dispatch method.- Specified by:
acceptin classCOSBase- Parameters:
visitor- The object to notify when visiting this object.- Throws:
IOException- If an error occurs while visiting this object.
-
close
This will close all storage and delete the tmp files.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException- If there is an error close resources.
-
isClosed
public boolean isClosed()Returns true if this document has been closed.- Returns:
- true if the document is already closed, false otherwise
-
getObjectFromPool
This will get an object from the pool.- Parameters:
key- The object key.- Returns:
- The object in the pool or a new one if it has not been parsed yet.
-
addXRefTable
Populate XRef HashMap with given values. Each entry maps ObjectKeys to byte offsets in the file.- Parameters:
xrefTableValues- xref table entries to be added
-
getXrefTable
Returns the xrefTable which is a mapping of ObjectKeys to byte offsets in the file.- Returns:
- mapping of ObjectsKeys to byte offsets
-
setStartXref
public void setStartXref(long startXrefValue) This method set the startxref value of the document. This will only be needed for incremental updates.- Parameters:
startXrefValue- the value for startXref
-
getStartXref
public long getStartXref()Return the startXref Position of the parsed document. This will only be needed for incremental updates.- Returns:
- a long with the old position of the startxref
-
isXRefStream
public boolean isXRefStream()Determines if the trailer is a XRef stream or not.- Returns:
- true if the trailer is a XRef stream
-
setIsXRefStream
public void setIsXRefStream(boolean isXRefStreamValue) Sets isXRefStream to the given value. You need to take care that the version of your PDF is 1.5 or higher.- Parameters:
isXRefStreamValue- the new value for isXRefStream
-
hasHybridXRef
public boolean hasHybridXRef()Determines if the pdf has hybrid cross references, both plain tables and streams.- Returns:
- true if the pdf has hybrid cross references
-
setHasHybridXRef
public void setHasHybridXRef()Marks the pdf as document using hybrid cross references. -
getDocumentState
Returns theCOSDocumentStateof thisCOSDocument.- Returns:
- The
COSDocumentStateof thisCOSDocument. - See Also:
-