Class PDDocument
- All Implemented Interfaces:
Closeable,AutoCloseable
- Direct Known Subclasses:
PreflightDocument
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate AccessPermissionprivate booleanprivate final COSDocumentprivate PDDocumentCatalogprivate Longprivate PDDocumentInformationprivate PDEncryptionprivate final Set<TrueTypeFont> private static final org.apache.commons.logging.Logprivate final RandomAccessReadprivate static final int[]For signing: large reserve byte range used as placeholder in the saved PDF until the actual length of the PDF is known.private ResourceCacheprivate booleanprivate SigningSupportprivate SignatureInterface -
Constructor Summary
ConstructorsConstructorDescriptionCreates an empty PDF document.PDDocument(COSDocument doc) Constructor that uses an existing document.PDDocument(COSDocument doc, RandomAccessRead source) Constructor that uses an existing document.PDDocument(COSDocument doc, RandomAccessRead source, AccessPermission permission) Constructor that uses an existing document.PDDocument(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction) Creates an empty PDF document. -
Method Summary
Modifier and TypeMethodDescriptionvoidThis will add a page to the document.voidaddSignature(PDSignature sigObject) Add parameters of signature to be created externally using default signature options.voidaddSignature(PDSignature sigObject, SignatureInterface signatureInterface) Add a signature to be created using the instance of given interface.voidaddSignature(PDSignature sigObject, SignatureInterface signatureInterface, SignatureOptions options) This will add a signature to the document.voidaddSignature(PDSignature sigObject, SignatureOptions options) Add parameters of signature to be created externally.private voidassignAcroFormDefaultResource(PDAcroForm acroForm, COSDictionary newDict) private voidassignAppearanceDictionary(PDAnnotationWidget firstWidget, COSDictionary apDict) private voidassignSignatureRectangle(PDAnnotationWidget firstWidget, COSDictionary annotDict) private booleancheckSignatureAnnotation(List<PDAnnotation> annotations, PDAnnotationWidget widget) Check if the widget already exists in the annotation list.private booleancheckSignatureField(Iterator<PDField> fieldIterator, PDSignatureField signatureField) Check if the field already exists in the field list.voidclose()This will close the underlying COSDocument object.private PDSignatureFieldfindSignatureField(Iterator<PDField> fieldIterator, PDSignature sigObject) Search acroform fields for signature field with specific signature dictionary.Returns the access permissions granted when the document was decrypted.This will get the low level document.This will get the document CATALOG.Provides the document ID.This will get the document info dictionary.This will get the encryption dictionary for this document.Returns the list of fonts which will be subset before the document is saved.This will return the last signature from the field tree.intThis will return the total page count of the PDF document.getPage(int pageIndex) Returns the page at the given 0-based index.getPages()Returns the page tree.Returns the resource cache associated with this document, or null if there is none.Retrieve all signature dictionaries from the document.Retrieve all signature fields from the document.floatReturns the PDF specification version this document conforms to.importPage(PDPage page) This will import and copy the contents from another location.booleanIndicates if all security is removed or not when writing the pdf.booleanThis will tell if this document is encrypted or not.private voidprepareNonVisibleSignature(PDAnnotationWidget firstWidget) private voidprepareVisibleSignature(PDAnnotationWidget firstWidget, PDAcroForm acroForm, COSDocument visualSignature) voidprotect(ProtectionPolicy policy) Protects the document with a protection policy.voidFor internal PDFBox use when creating PDF documents: register a TrueTypeFont to make sure it is closed when the PDDocument is closed to avoid memory leaks.voidremovePage(int pageNumber) Remove the page from the document.voidremovePage(PDPage page) Remove the page from the document.voidSave the document to a file using default compression.voidsave(File file, CompressParameters compressParameters) Save the document using the given compression.voidsave(OutputStream output) This will save the document to an output stream.voidsave(OutputStream output, CompressParameters compressParameters) Save the document using the given compression.voidSave the document to a file using default compression.voidsave(String fileName, CompressParameters compressParameters) Save the document to a file using the given compression.voidsaveIncremental(OutputStream output) Save the PDF as an incremental update.voidsaveIncremental(OutputStream output, Set<COSDictionary> objectsToWrite) Save the PDF as an incremental update.Save PDF incrementally without closing for external signature creation scenario.voidsetAllSecurityToBeRemoved(boolean removeAllSecurity) Activates/Deactivates the removal of all security when writing the pdf.voidsetDocumentId(Long docId) Sets the document ID to the given value.voidThis will set the document information for this document.voidsetEncryptionDictionary(PDEncryption encryption) This will set the encryption dictionary for this document.voidsetResourceCache(ResourceCache resourceCache) Sets the resource cache associated with this document.voidsetVersion(float newVersion) Sets the PDF specification version for this document.private void
-
Field Details
-
RESERVE_BYTE_RANGE
private static final int[] RESERVE_BYTE_RANGEFor signing: large reserve byte range used as placeholder in the saved PDF until the actual length of the PDF is known. You'll need to fetch (withPDSignature.getByteRange()) and reassign this yourself (withPDSignature.setByteRange(int[])) only if you callsaveIncrementalForExternalSigning()twice. -
LOG
private static final org.apache.commons.logging.Log LOG -
document
-
documentInformation
-
documentCatalog
-
encryption
-
allSecurityToBeRemoved
private boolean allSecurityToBeRemoved -
documentId
-
pdfSource
-
accessPermission
-
fontsToSubset
-
fontsToClose
-
signInterface
-
signingSupport
-
resourceCache
-
signatureAdded
private boolean signatureAdded
-
-
Constructor Details
-
PDDocument
public PDDocument()Creates an empty PDF document. You need to add at least one page for the document to be valid. -
PDDocument
Creates an empty PDF document. You need to add at least one page for the document to be valid.- Parameters:
streamCacheCreateFunction- a function to create an instance of a stream cache for buffering PDF streams
-
PDDocument
Constructor that uses an existing document. The COSDocument that is passed in must be valid.- Parameters:
doc- The COSDocument that this document wraps.
-
PDDocument
Constructor that uses an existing document. The COSDocument that is passed in must be valid.- Parameters:
doc- The COSDocument that this document wraps.source- input representing the pdf
-
PDDocument
Constructor that uses an existing document. The COSDocument that is passed in must be valid.- Parameters:
doc- The COSDocument that this document wraps.source- input representing the pdfpermission- he access permissions of the pdf
-
-
Method Details
-
addPage
This will add a page to the document. This is a convenience method, that will add the page to the root of the hierarchy and set the parent of the page to the root.- Parameters:
page- The page to add to the document.
-
addSignature
Add parameters of signature to be created externally using default signature options. SeesaveIncrementalForExternalSigning(OutputStream)method description on external signature creation scenario details.Only one signature may be added in a document. To sign several times, load document, add signature, save incremental and close again.
- Parameters:
sigObject- is the PDSignatureField model- Throws:
IOException- if there is an error creating required fieldsIllegalStateException- if one attempts to add several signature fields.
-
addSignature
Add parameters of signature to be created externally. SeesaveIncrementalForExternalSigning(OutputStream)method description on external signature creation scenario details.Only one signature may be added in a document. To sign several times, load document, add signature, save incremental and close again.
- Parameters:
sigObject- is the PDSignatureField modeloptions- signature options- Throws:
IOException- if there is an error creating required fieldsIllegalStateException- if one attempts to add several signature fields.
-
addSignature
public void addSignature(PDSignature sigObject, SignatureInterface signatureInterface) throws IOException Add a signature to be created using the instance of given interface.Only one signature may be added in a document. To sign several times, load document, add signature, save incremental and close again.
- Parameters:
sigObject- is the PDSignatureField modelsignatureInterface- is an interface whose implementation provides signing capabilities. Can be null if external signing if used.- Throws:
IOException- if there is an error creating required fieldsIllegalStateException- if one attempts to add several signature fields.
-
addSignature
public void addSignature(PDSignature sigObject, SignatureInterface signatureInterface, SignatureOptions options) throws IOException This will add a signature to the document. If the 0-based page number in the options parameter is smaller than 0 or larger than max, the nearest valid page number will be used (i.e. 0 or max) and no exception will be thrown.Only one signature may be added in a document. To sign several times, load document, add signature, save incremental and close again.
- Parameters:
sigObject- is the PDSignatureField modelsignatureInterface- is an interface whose implementation provides signing capabilities. Can be null if external signing if used.options- signature options- Throws:
IOException- if there is an error creating required fieldsIllegalStateException- if one attempts to add several signature fields.
-
findSignatureField
Search acroform fields for signature field with specific signature dictionary.- Parameters:
fieldIterator- iterator on all fields.sigObject- signature object (the /V part).- Returns:
- a signature field if found, or null if none was found.
-
checkSignatureField
private boolean checkSignatureField(Iterator<PDField> fieldIterator, PDSignatureField signatureField) Check if the field already exists in the field list.- Parameters:
fieldIterator- iterator on all fields.signatureField- the signature field.- Returns:
- true if the field already existed in the field list, false if not.
-
checkSignatureAnnotation
Check if the widget already exists in the annotation list.- Parameters:
annotations- the list of PDAnnotation fields.widget- the annotation widget.- Returns:
- true if the widget already existed in the annotation list, false if not.
-
prepareNonVisibleSignature
-
prepareVisibleSignature
private void prepareVisibleSignature(PDAnnotationWidget firstWidget, PDAcroForm acroForm, COSDocument visualSignature) -
assignSignatureRectangle
-
assignAppearanceDictionary
-
assignAcroFormDefaultResource
-
removePage
Remove the page from the document. Do not use this method if other pages link to this one or if your document has a structure tree for accessibility unless you are able to fix these as well. In such cases it is better to use the splitter() class which will do these fixes.- Parameters:
page- The page to remove from the document.
-
removePage
public void removePage(int pageNumber) Remove the page from the document. Do not use this method if other pages link to this one or if your document has a structure tree for accessibility unless you are able to fix these as well. In such cases it is better to use the splitter() class which will do these fixes.- Parameters:
pageNumber- 0 based index to page number.
-
importPage
This will import and copy the contents from another location. Currently the content stream is stored in a scratch file. The scratch file is associated with the document. If you are adding a page to this document from another document and want to copy the contents to this document's scratch file then use this method otherwise just use theaddPage()method.Unlike
addPage(), this method creates a new PDPage object. If your page has annotations, and if these link to pages not in the target document, then the target document might become huge. What you need to do is to delete page references of such annotations. See here for how to do this.Inherited (global) resources are ignored because these can contain resources not needed for this page which could bloat your document, see PDFBOX-28 and related issues. If you need them, call
importedPage.setResources(page.getResources());This method should only be used to import a page from a loaded document, not from a generated document because these can contain unfinished parts, e.g. font subsetting information.
- Parameters:
page- The page to import.- Returns:
- The page that was imported.
- Throws:
IOException- If there is an error copying the page.
-
getDocument
This will get the low level document.- Returns:
- The document that this layer sits on top of.
-
getDocumentInformation
This will get the document info dictionary. If it doesn't exist, an empty document info dictionary is created in the document trailer.In PDF 2.0 this is deprecated except for two entries, /CreationDate and /ModDate. For any other document level metadata, a metadata stream should be used instead, see
PDDocumentCatalog.getMetadata().- Returns:
- The documents /Info dictionary, never null.
-
setDocumentInformation
This will set the document information for this document.In PDF 2.0 this is deprecated except for two entries, /CreationDate and /ModDate. For any other document level metadata, a metadata stream should be used instead, see
PDDocumentCatalog#setMetadata(PDMetadata).- Parameters:
info- The updated document information.
-
getDocumentCatalog
This will get the document CATALOG. This is guaranteed to not return null.- Returns:
- The documents /Root dictionary
-
isEncrypted
public boolean isEncrypted()This will tell if this document is encrypted or not.- Returns:
- true If this document is encrypted.
-
getEncryption
This will get the encryption dictionary for this document. This will still return the parameters if the document was decrypted. As the encryption architecture in PDF documents is pluggable this returns an abstract class, but the only supported subclass at this time is a PDStandardEncryption object.- Returns:
- The encryption dictionary(most likely a PDStandardEncryption object)
-
setEncryptionDictionary
This will set the encryption dictionary for this document.- Parameters:
encryption- The encryption dictionary(most likely a PDStandardEncryption object)
-
getLastSignatureDictionary
This will return the last signature from the field tree. Note that this may not be the last in time when empty signature fields are created first but signed after other fields.- Returns:
- the last signature as
PDSignatureField.
-
getSignatureFields
Retrieve all signature fields from the document.- Returns:
- a
ListofPDSignatureFields
-
getSignatureDictionaries
Retrieve all signature dictionaries from the document.- Returns:
- a
ListofPDSignatureFields
-
registerTrueTypeFontForClosing
For internal PDFBox use when creating PDF documents: register a TrueTypeFont to make sure it is closed when the PDDocument is closed to avoid memory leaks. Users don't have to call this method, it is done by the appropriate PDFont classes.- Parameters:
ttf- the TrueTypeFont to be registered
-
getFontsToSubset
Returns the list of fonts which will be subset before the document is saved. -
save
Save the document to a file using default compression.Don't use the input file as target as this will produce a corrupted file.
If encryption has been activated (with
protect(ProtectionPolicy)), do not use the document after saving because the contents are now encrypted. The same applies if your file was created from parts of another file and that one is to be used after saving.- Parameters:
fileName- The file to save as.- Throws:
IOException- if the output could not be written
-
save
Save the document to a file using default compression.Don't use the input file as target as this will produce a corrupted file.
If encryption has been activated (with
protect(ProtectionPolicy)), do not use the document after saving because the contents are now encrypted. The same applies if your file was created from parts of another file and that one is to be used after saving.- Parameters:
file- The file to save as.- Throws:
IOException- if the output could not be written
-
save
This will save the document to an output stream.Don't use the input file as target as this will produce a corrupted file.
If encryption has been activated (with
protect(ProtectionPolicy)), do not use the document after saving because the contents are now encrypted. The same applies if your file was created from parts of another file and that one is to be used after saving.- Parameters:
output- The stream to write to. It is recommended to wrap it in aBufferedOutputStream, unless it is already buffered.- Throws:
IOException- if the output could not be written
-
save
Save the document using the given compression.Don't use the input file as target as this will produce a corrupted file.
If encryption has been activated (with
protect(ProtectionPolicy)), do not use the document after saving because the contents are now encrypted. The same applies if your file was created from parts of another file and that one is to be used after saving.- Parameters:
file- The file to save as.compressParameters- The parameters for the document's compression.- Throws:
IOException- if the output could not be written
-
save
Save the document to a file using the given compression.Don't use the input file as target as this will produce a corrupted file.
If encryption has been activated (with
protect(ProtectionPolicy)), do not use the document after saving because the contents are now encrypted. The same applies if your file was created from parts of another file and that one is to be used after saving.- Parameters:
fileName- The file to save as.compressParameters- The parameters for the document's compression.- Throws:
IOException- if the output could not be written
-
save
Save the document using the given compression.Don't use the input file as target as this will produce a corrupted file.
If encryption has been activated (with
protect(ProtectionPolicy)), do not use the document after saving because the contents are now encrypted. The same applies if your file was created from parts of another file and that one is to be used after saving.- Parameters:
output- The stream to write to. It is recommended to wrap it in aBufferedOutputStream, unless it is already buffered.compressParameters- The parameters for the document's compression.- Throws:
IOException- if the output could not be written
-
subsetDesignatedFonts
- Throws:
IOException
-
saveIncremental
Save the PDF as an incremental update. This is only possible if the PDF was loaded from a file or a stream, not if the document was created in PDFBox itself. There must be a path of objects that haveCOSUpdateInfo.isNeedToBeUpdated()set, starting from the document catalog. For signatures this is taken care by PDFBox itself.Other usages of this method are for experienced users only. You will usually never need it. It is useful only if you are required to keep the current revision and append the changes. A typical use case is changing a signed file without invalidating the signature.
If your modification includes annotations, make sure these link back to their page by calling
PDAnnotation.setPage(PDPage). Although this is optional, not doing it can cause trouble when PDFs get signed. (PDFBox already does this for signature widget annotations)Another problem with page-based modifications can occur if the page tree isn't flat: there won't be an closed update path from the catalog to the page. To fix this, add code like this:
COSDictionary parent = page.getCOSObject().getCOSDictionary(COSName.PARENT); while (parent != null) { parent.setNeedToBeUpdated(true); parent = parent.getCOSDictionary(COSName.PARENT); }Don't use the input file as target as this will produce a corrupted file.
- Parameters:
output- stream to write to. It must never point to the source file or that one will be harmed!- Throws:
IOException- if the output could not be writtenIllegalStateException- if the document was not loaded from a file or a stream.
-
saveIncremental
public void saveIncremental(OutputStream output, Set<COSDictionary> objectsToWrite) throws IOException Save the PDF as an incremental update. This is only possible if the PDF was loaded from a file or a stream, not if the document was created in PDFBox itself. This allows to include objects even if there is no path of objects that haveCOSUpdateInfo.isNeedToBeUpdated()set so the incremental update gets smaller. Only dictionaries are supported; if you need to update other objects classes, then add their parent dictionary.This method is for experienced users only. You will usually never need it. It is useful only if you are required to keep the current revision and append the changes. A typical use case is changing a signed file without invalidating the signature. To know which objects are getting changed, you need to have some understanding of the PDF specification, and look at the saved file with an editor to verify that you are updating the correct objects. You should also inspect the page and document structures of the file with PDFDebugger.
If your modification includes annotations, make sure these link back to their page by calling
PDAnnotation.setPage(PDPage). Although this is optional, not doing it can cause trouble when PDFs get signed. (PDFBox already does this for signature widget annotations)Don't use the input file as target as this will produce a corrupted file.
- Parameters:
output- stream to write to. It must never point to the source file or that one will be harmed!objectsToWrite- objects that must be part of the incremental saving.- Throws:
IOException- if the output could not be writtenIllegalStateException- if the document was not loaded from a file or a stream.
-
saveIncrementalForExternalSigning
public ExternalSigningSupport saveIncrementalForExternalSigning(OutputStream output) throws IOException Save PDF incrementally without closing for external signature creation scenario. The general sequence is:PDDocument pdDocument = ...; OutputStream outputStream = ...; SignatureOptions signatureOptions = ...; // options to specify fine tuned signature options or null for defaults PDSignature pdSignature = ...; // add signature parameters to be used when creating signature dictionary pdDocument.addSignature(pdSignature, signatureOptions); // prepare PDF for signing and obtain helper class to be used ExternalSigningSupport externalSigningSupport = pdDocument.saveIncrementalForExternalSigning(outputStream); // get data to be signed InputStream dataToBeSigned = externalSigningSupport.getContent(); // invoke signature service byte[] signature = sign(dataToBeSigned); // set resulted CMS signature externalSigningSupport.setSignature(signature); // last step is to close the document pdDocument.close();Note that after calling this method, only
close()method may invoked forPDDocumentinstance and only AFTERExternalSigningSupportinstance is used.Don't use the input file as target as this will produce a corrupted file.
- Parameters:
output- stream to write the final PDF. It must never point to the source file or that one will be harmed!- Returns:
- instance to be used for external signing and setting CMS signature
- Throws:
IOException- if the output could not be writtenIllegalStateException- if the document was not loaded from a file or a stream or signature options were not set.
-
getPage
Returns the page at the given 0-based index.This method is too slow to get all the pages from a large PDF document (1000 pages or more). For such documents, use the iterator of
getPages()instead.- Parameters:
pageIndex- the 0-based page index- Returns:
- the page at the given index.
- Throws:
IllegalStateException- if the requested index isn't found or doesn't point to a valid page dictionary.IndexOutOfBoundsException- if the requested index is higher than the page count.
-
getPages
Returns the page tree.- Returns:
- the page tree
-
getNumberOfPages
public int getNumberOfPages()This will return the total page count of the PDF document.- Returns:
- The total number of pages in the PDF document.
-
close
This will close the underlying COSDocument object.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException- If there is an error releasing resources.
-
protect
Protects the document with a protection policy. The document content will be really encrypted when it will be saved. This method only marks the document for encryption. It also callssetAllSecurityToBeRemoved(boolean)with a false argument if it was set to true previously and logs a warning.Do not use the document after saving, because the structures are encrypted. The same applies if your file was created from parts of another file and that one is to be used after saving.
- Parameters:
policy- The protection policy.- Throws:
IOException- if there isn't any suitable security handler.- See Also:
-
getCurrentAccessPermission
Returns the access permissions granted when the document was decrypted. If the document was not decrypted this method returns the access permission for a document owner (ie can do everything). The returned object is in read only mode so that permissions cannot be changed. Methods providing access to content should rely on this object to verify if the current user is allowed to proceed.- Returns:
- the access permissions for the current user on the document.
-
isAllSecurityToBeRemoved
public boolean isAllSecurityToBeRemoved()Indicates if all security is removed or not when writing the pdf.- Returns:
- returns true if all security shall be removed otherwise false
-
setAllSecurityToBeRemoved
public void setAllSecurityToBeRemoved(boolean removeAllSecurity) Activates/Deactivates the removal of all security when writing the pdf.- Parameters:
removeAllSecurity- remove all security if set to true
-
getDocumentId
Provides the document ID. This is not the trailer document ID but the time used to create it. UseCOSDocument.getDocumentID()for the trailer document ID. Read PDFBOX-1613 for more details about the purpose.- Returns:
- the document ID
-
setDocumentId
Sets the document ID to the given value. This is not the trailer document ID but the time used to create it. UseCOSDocument.setDocumentID(COSArray)for the trailer document ID. Read PDFBOX-1613 for more details about the purpose.- Parameters:
docId- the new document ID
-
getVersion
public float getVersion()Returns the PDF specification version this document conforms to.- Returns:
- the PDF version (e.g. 1.4f)
-
setVersion
public void setVersion(float newVersion) Sets the PDF specification version for this document.- Parameters:
newVersion- the new PDF version (e.g. 1.4f)
-
getResourceCache
Returns the resource cache associated with this document, or null if there is none.- Returns:
- the resource cache of the document
-
setResourceCache
Sets the resource cache associated with this document.- Parameters:
resourceCache- A resource cache, or null.
-