Class PdfReader
java.lang.Object
com.itextpdf.kernel.pdf.PdfReader
- All Implemented Interfaces:
Closeable, AutoCloseable
- Direct Known Subclasses:
SignatureUtil.ContentsChecker
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static classstatic enumEnumeration representing the strictness level for reading.(package private) static classClass containing a callback which is called on every xref table reading. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static booleanprivate PdfIndirectReferenceprotected PdfEncryptionstatic final PdfReader.StrictnessLevelThe defaultPdfReader.StrictnessLevelto be used.protected booleanprivate static final byte[]private static final byte[]private static final Stringprivate static final Stringprivate static final Stringprivate static final Stringprotected longprotected booleanprotected PdfVersionprotected booleanprotected longprivate booleanprivate PdfConformanceprotected PdfDocumentprotected ReaderPropertiesprotected booleanprivate PdfReader.StrictnessLevelprotected PdfTokenizerprotected PdfDictionaryprivate booleanprivate XMPMetaprivate PdfReader.XrefProcessorprotected boolean -
Constructor Summary
ConstructorsConstructorDescriptionPdfReader(IRandomAccessSource byteSource, ReaderProperties properties) Constructs a new PdfReader.PdfReader(IRandomAccessSource byteSource, ReaderProperties properties, boolean closeStream) Reads and parses a PDF document.PdfReader(File file, ReaderProperties properties) Reads and parses a PDF document.PdfReader(InputStream is) Reads and parses a PDF document.PdfReader(InputStream is, ReaderProperties properties) Reads and parses a PDF document.Reads and parses a PDF document.PdfReader(String filename, ReaderProperties properties) Reads and parses a PDF document. -
Method Summary
Modifier and TypeMethodDescriptionprivate voidcheckPdfStreamLength(PdfStream pdfStream) voidclose()ClosePdfTokenizer.byte[]Computes user password if standard encryption handler is used with Standard40, Standard128 or AES128 encryption algorithm.private PdfObjectcreatePdfNullInstance(boolean readAsDirect) static byte[]decodeBytes(byte[] b, PdfDictionary streamDictionary) Decode bytes applying the filters specified in the provided dictionary using default filter handlers.static byte[]decodeBytes(byte[] b, PdfDictionary streamDictionary, Map<PdfName, IFilterHandler> filterHandlers) Decode a byte[] applying the filters specified in the provided dictionary using the provided filter handlers.protected voidfixXref()intGets encryption algorithm and access permissions.longProvides the size of the opened file.longGets position of the last Cross-Reference table.byte[]Gets modified file ID, the second element inPdfName.IDkey of trailer.private static PdfTokenizergetOffsetTokeniser(IRandomAccessSource byteSource, boolean closeStream) Utility method that checks the provided byte source to see if it has junk bytes at the beginning.byte[]Gets original file ID, the first element inPdfName.IDkey of trailer.Gets the declared PDF conformance of the source document that is being read.intGets the encryption permissions.Gets a copy ofReaderPropertiesused to create this instance ofPdfReader.Gets a new file instance of the original PDF document.Get the currentPdfReader.StrictnessLevelof the reader.protected PdfNumbergetXrefPrev(PdfObject prevObjectToCheck) booleanIf any exception generated while reading PdfObject, PdfReader will try to fix offsets of all objects.booleanSome documents contain hybrid XRef, for more information see "7.5.8.4 Compatibility with Applications That Do Not Support Compressed Reference Streams" in PDF 32000-1:2008 spec.booleanIf any exception generated while reading XRef section, PdfReader will try to rebuild it.booleanIndicates whether the document has Cross-Reference Streams.booleanGets whetherclose()method shall close input stream.private booleanbooleanChecks if thePdfDocumentread with thisPdfReaderis encrypted.(package private) booleanbooleanChecks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply.private static voidprivate voidprivate voidprocessXref(PdfXrefTable xrefTable) protected PdfArrayreadArray(boolean objStm) private voidprotected PdfDictionaryreadDictionary(boolean objStm) protected PdfObjectreadObject(boolean readAsDirect) protected PdfObjectreadObject(boolean readAsDirect, boolean objStm) protected PdfObjectreadObject(PdfIndirectReference reference) private PdfObjectreadObject(PdfIndirectReference reference, boolean fixXref) protected voidreadObjectStream(PdfStream objectStream) protected voidreadPdf()Parses the entire PDFprotected PdfNamereadPdfName(boolean readAsDirect) protected PdfObjectreadReference(boolean readAsDirect) readStream(PdfStream stream, boolean decode) Reads, decrypts and optionally decodes stream bytes intoByteArrayInputStream.byte[]readStreamBytes(PdfStream stream, boolean decode) Reads, decrypt and optionally decode stream bytes.byte[]readStreamBytesRaw(PdfStream stream) Reads and decrypt stream bytes.protected voidreadXref()protected PdfDictionaryprotected booleanreadXrefStream(long ptr) protected voidvoidsetCloseStream(boolean closeStream) Sets whetherclose()method shall close input stream.setMemorySavingMode(boolean memorySavingMode) Defines if memory saving mode is enabled.setStrictnessLevel(PdfReader.StrictnessLevel strictnessLevel) Set thePdfReader.StrictnessLevelfor the reader.private voidsetTrailerFromTrailerIndex(Long trailerIndex) setUnethicalReading(boolean unethicalReading) The iText is not responsible if you decide to change the value of this parameter.(package private) voidsetXrefProcessor(PdfReader.XrefProcessor xrefProcessor)
-
Field Details
-
DEFAULT_STRICTNESS_LEVEL
The defaultPdfReader.StrictnessLevelto be used. -
endstream1
- See Also:
-
endstream2
- See Also:
-
endstream3
- See Also:
-
endstream4
- See Also:
-
endstream
private static final byte[] endstream -
endobj
private static final byte[] endobj -
correctStreamLength
protected static boolean correctStreamLength -
unethicalReading
private boolean unethicalReading -
memorySavingMode
private boolean memorySavingMode -
strictnessLevel
-
currentIndirectReference
-
xrefProcessor
-
tokens
-
decrypt
-
headerPdfVersion
-
lastXref
protected long lastXref -
eofPos
protected long eofPos -
trailer
-
pdfDocument
-
properties
-
encrypted
protected boolean encrypted -
rebuiltXref
protected boolean rebuiltXref -
hybridXref
protected boolean hybridXref -
fixedXref
protected boolean fixedXref -
xrefStm
protected boolean xrefStm -
xmpMeta
-
pdfConformance
-
-
Constructor Details
-
PdfReader
Constructs a new PdfReader.- Parameters:
byteSource- source of bytes for the readerproperties- properties of the created reader- Throws:
IOException- if an I/O error occurs
-
PdfReader
Reads and parses a PDF document.- Parameters:
is- theInputStreamcontaining the document. If the inputStream is an instance ofRASInputStreamthen theIRandomAccessSourcewould be extracted. Otherwise the stream is read to the end but is not closed.properties- properties of the created reader- Throws:
IOException- on error
-
PdfReader
Reads and parses a PDF document.- Parameters:
file- theFilecontaining the document.- Throws:
IOException- on errorFileNotFoundException- when the specified File is not found
-
PdfReader
Reads and parses a PDF document.- Parameters:
is- theInputStreamcontaining the document. If the inputStream is an instance ofRASInputStreamthen theIRandomAccessSourcewould be extracted. Otherwise the stream is read to the end but is not closed.- Throws:
IOException- on error
-
PdfReader
Reads and parses a PDF document.- Parameters:
filename- the file name of the documentproperties- properties of the created reader- Throws:
IOException- on error
-
PdfReader
Reads and parses a PDF document.- Parameters:
filename- the file name of the document- Throws:
IOException- on error
-
PdfReader
Reads and parses a PDF document.- Parameters:
file- the file of the documentproperties- properties of the created reader- Throws:
IOException- on error
-
PdfReader
PdfReader(IRandomAccessSource byteSource, ReaderProperties properties, boolean closeStream) throws IOException - Throws:
IOException
-
-
Method Details
-
close
ClosePdfTokenizer.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException- on error.
-
setUnethicalReading
The iText is not responsible if you decide to change the value of this parameter.- Parameters:
unethicalReading- true to enable unethicalReading, false to disable it. By default unethicalReading is disabled.- Returns:
- this
PdfReaderinstance.
-
setMemorySavingMode
Defines if memory saving mode is enabled.By default memory saving mode is disabled for the sake of time–memory trade-off.
If memory saving mode is enabled, document processing might slow down, but reading will be less memory demanding.
- Parameters:
memorySavingMode- true to enable memory saving mode, false to disable it.- Returns:
- this
PdfReaderinstance.
-
getStrictnessLevel
Get the currentPdfReader.StrictnessLevelof the reader.- Returns:
- the current
PdfReader.StrictnessLevel
-
setStrictnessLevel
Set thePdfReader.StrictnessLevelfor the reader. If the argument isnull, then theDEFAULT_STRICTNESS_LEVELwill be used.- Parameters:
strictnessLevel- thePdfReader.StrictnessLevelto set- Returns:
- this
PdfReaderinstance
-
isCloseStream
-
setCloseStream
-
hasRebuiltXref
public boolean hasRebuiltXref()If any exception generated while reading XRef section, PdfReader will try to rebuild it.- Returns:
- true, if PdfReader rebuilt Cross-Reference section.
- Throws:
PdfException- if the method has been invoked before the PDF document was read.
-
hasHybridXref
public boolean hasHybridXref()Some documents contain hybrid XRef, for more information see "7.5.8.4 Compatibility with Applications That Do Not Support Compressed Reference Streams" in PDF 32000-1:2008 spec.- Returns:
- true, if the document has hybrid Cross-Reference section.
- Throws:
PdfException- if the method has been invoked before the PDF document was read.
-
hasXrefStm
public boolean hasXrefStm()Indicates whether the document has Cross-Reference Streams.- Returns:
- true, if the document has Cross-Reference Streams.
- Throws:
PdfException- if the method has been invoked before the PDF document was read.
-
hasFixedXref
public boolean hasFixedXref()If any exception generated while reading PdfObject, PdfReader will try to fix offsets of all objects.This method's returned value might change over time, because PdfObjects reading can be postponed even up to document closing.
- Returns:
- true, if PdfReader fixed offsets of PdfObjects.
- Throws:
PdfException- if the method has been invoked before the PDF document was read.
-
getLastXref
public long getLastXref()Gets position of the last Cross-Reference table.- Returns:
- -1 if Cross-Reference table has rebuilt, otherwise position of the last Cross-Reference table.
- Throws:
PdfException- if the method has been invoked before the PDF document was read.
-
readStreamBytes
Reads, decrypt and optionally decode stream bytes. Note, this method doesn't store actual bytes in any internal structures.- Parameters:
stream- aPdfStreamstream instance to be read and optionally decoded.decode- true if to get decoded stream bytes, false if to leave it originally encoded.- Returns:
- byte[] array.
- Throws:
IOException- on error.
-
readStreamBytesRaw
Reads and decrypt stream bytes. Note, this method doesn't store actual bytes in any internal structures.- Parameters:
stream- aPdfStreamstream instance to be read- Returns:
- byte[] array.
- Throws:
IOException- on error.
-
readStream
Reads, decrypts and optionally decodes stream bytes intoByteArrayInputStream. User is responsible for closing returned stream.- Parameters:
stream- aPdfStreamstream instance to be readdecode- true if to get decoded stream, false if to leave it originally encoded.- Returns:
- InputStream or
nullif reading was failed. - Throws:
IOException- on error.
-
decodeBytes
Decode bytes applying the filters specified in the provided dictionary using default filter handlers.- Parameters:
b- the bytes to decodestreamDictionary- the dictionary that contains filter information- Returns:
- the decoded bytes
- Throws:
PdfException- if there are any problems decoding the bytes
-
decodeBytes
public static byte[] decodeBytes(byte[] b, PdfDictionary streamDictionary, Map<PdfName, IFilterHandler> filterHandlers) Decode a byte[] applying the filters specified in the provided dictionary using the provided filter handlers.- Parameters:
b- the bytes to decodestreamDictionary- the dictionary that contains filter informationfilterHandlers- the map used to look up a handler for each type of filter- Returns:
- the decoded bytes
- Throws:
PdfException- if there are any problems decoding the bytes
-
getSafeFile
Gets a new file instance of the original PDF document.- Returns:
- a new file instance of the original PDF document
-
getFileLength
public long getFileLength()Provides the size of the opened file.- Returns:
- The size of the opened file.
-
isOpenedWithFullPermission
public boolean isOpenedWithFullPermission()Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply. If the document is not encrypted it will returntrue.- Returns:
trueif the document was opened with the owner password or if it's not encrypted,falseif the document was opened with the user password.- Throws:
PdfException- if the method has been invoked before the PDF document was read.
-
getPermissions
public int getPermissions()Gets the encryption permissions. It can be used directly inWriterProperties.setStandardEncryption(byte[], byte[], int, int). See ISO 32000-1, Table 22 for more details.- Returns:
- the encryption permissions.
- Throws:
PdfException- if the method has been invoked before the PDF document was read.
-
getCryptoMode
public int getCryptoMode()Gets encryption algorithm and access permissions.- Returns:
intvalue corresponding to a certain type of encryption.- Throws:
PdfException- if the method has been invoked before the PDF document was read.- See Also:
-
getPdfConformance
Gets the declared PDF conformance of the source document that is being read. Note that this information is provided via XMP metadata and is not verified by iText. Conformance is lazy initialized. It will be initialized during the first call of this method.- Returns:
- conformance of the source document
-
computeUserPassword
public byte[] computeUserPassword()Computes user password if standard encryption handler is used with Standard40, Standard128 or AES128 encryption algorithm.- Returns:
- user password, or null if not a standard encryption handler was used or if ownerPasswordUsed wasn't use to open the document.
- Throws:
PdfException- if the method has been invoked before the PDF document was read.
-
getOriginalFileId
public byte[] getOriginalFileId()Gets original file ID, the first element inPdfName.IDkey of trailer. If the size of ID array does not equal 2, an empty array will be returned.The returned value reflects the value that was written in opened document. If document is modified, the ultimate document id can be retrieved from
PdfDocument.getOriginalDocumentId().- Returns:
- byte array represents original file ID.
- Throws:
PdfException- if the method has been invoked before the PDF document was read.- See Also:
-
getModifiedFileId
public byte[] getModifiedFileId()Gets modified file ID, the second element inPdfName.IDkey of trailer. If the size of ID array does not equal 2, an empty array will be returned.The returned value reflects the value that was written in opened document. If document is modified, the ultimate document id can be retrieved from
PdfDocument.getModifiedDocumentId().- Returns:
- byte array represents modified file ID.
- Throws:
PdfException- if the method has been invoked before the PDF document was read.- See Also:
-
isEncrypted
public boolean isEncrypted()Checks if thePdfDocumentread with thisPdfReaderis encrypted.- Returns:
trueis the document is encrypted, otherwisefalse.- Throws:
PdfException- if the method has been invoked before the PDF document was read.
-
getPropertiesCopy
Gets a copy ofReaderPropertiesused to create this instance ofPdfReader.- Returns:
- a copy of
ReaderPropertiesused to create this instance ofPdfReader
-
readPdf
Parses the entire PDF- Throws:
IOException- if an I/O error occurs.
-
readObjectStream
- Throws:
IOException
-
readObject
-
readObject
- Throws:
IOException
-
readReference
-
readObject
- Throws:
IOException
-
readPdfName
-
readDictionary
- Throws:
IOException
-
readArray
- Throws:
IOException
-
readXref
- Throws:
IOException
-
readXrefSection
- Throws:
IOException
-
readXrefStream
- Throws:
IOException
-
fixXref
- Throws:
IOException
-
rebuildXref
- Throws:
IOException
-
isCurrentObjectATrailer
private boolean isCurrentObjectATrailer() -
setTrailerFromTrailerIndex
- Throws:
IOException
-
getXrefPrev
-
isMemorySavingMode
boolean isMemorySavingMode() -
setXrefProcessor
-
processArrayReadError
private void processArrayReadError() -
readDecryptObj
private void readDecryptObj() -
readObject
-
checkPdfStreamLength
- Throws:
IOException
-
createPdfNullInstance
-
getOffsetTokeniser
private static PdfTokenizer getOffsetTokeniser(IRandomAccessSource byteSource, boolean closeStream) throws IOException Utility method that checks the provided byte source to see if it has junk bytes at the beginning. If junk bytes are found, construct a tokeniser that ignores the junk. Otherwise, construct a tokeniser for the byte source as it is- Parameters:
byteSource- the source to check- Returns:
- a tokeniser that is guaranteed to start at the PDF header
- Throws:
IOException- if there is a problem reading the byte source
-
processXref
- Throws:
IOException
-
logXrefException
-