Class PDFFile
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) ByteBufferA ByteBuffer containing the file data(package private) Cachea mapping of page numbers to parsed PDF commandsprivate PDFDecrypterThe default decrypter for streams and strings.(package private) PDFObjectthe Encrypt PDFObject, from the trailerstatic final int(package private) PDFObjectThe Info PDFPbject, from the trailer, for simple metadataprivate intprivate intstatic final int(package private) PDFXref[]the cross reference table mapping object numbers to locations in the PDF fileprivate booleanwhether the file is printable or not (trailer -> Encrypt -> P & 0x4)(package private) PDFObjectthe root PDFObject, as specified in the PDF fileprivate booleanwhether the file is saveable or not (trailer -> Encrypt -> P & 0x10)private static final Stringthe comment text to begin the file to determine it's versionprivate String -
Constructor Summary
ConstructorsConstructorDescriptionPDFFile(ByteBuffer buf) get a PDFFile from a .pdf file.PDFFile(ByteBuffer buf, boolean doNotParse) PDFFile(ByteBuffer buf, PDFPassword password) get a PDFFile from a .pdf file. -
Method Summary
Modifier and TypeMethodDescriptionprivate voidConsume all sequential whitespace from the current buffer position, leaving the buffer positioned at non-whitespaceprivate PDFPagecreatePage(int pagenum, PDFObject pageObj) Create a PDF Page object by finding the relevant inherited propertiesdereference(PDFXref ref, PDFDecrypter decrypter) Used internally to track down PDFObject references.private PDFObjectGet the PDFObject representing the content of a particular page.private byte[]getContents(PDFObject pageObj) get the stream representing the content of a particular page.Get the default decrypter for the documentprivate PDFObjectgetInheritedValue(PDFObject pageObj, String propName) Find a property value in a page that may be inherited.intreturn the major version of the PDF header.Get the keys into the Info metadata, for use withgetStringMetadata(String)intreturn the minor version of the PDF header.intreturn the number of pages in this PDFFile.Gets the outline tree as a tree of OutlineNode, which is a subclass of DefaultMutableTreeNode.getPage(int pagenum) Get the page commands for a given page in a separate thread.getPage(int pagenum, boolean wait) Get the page commands for a given page.intgetPageNumber(PDFObject page) Gets the page number (starting from 1) of the page represented by a particular PDFObject.getRoot()get the root PDFObject of this PDFFile.getStringMetadata(String name) Get metadata (e.g., Author, Title, Creator) from the Info dictionary as a string.return the version string from the PDF header.static booleanisDelimiter(int c) Is the argument a delimiter according to the PDF spec?booleanGets whether the owner of the file has given permission to print the file.static booleanisRegularCharacter(int c) return true if the character is neither a whitespace or a delimiter.booleanGets whether the owner of the file has given permission to save a copy of the file.static booleanisWhiteSpace(int c) Is the argument a white space character according to the PDF spec?.private booleannextItemIs(String match) requires the next few characters (after whitespace) to match the argument.private intGet the next non-white space characterprivate voidparseFile(PDFPassword password) build the PDFFile reference table.static Rectangle2Dprivate voidprocessVersion(String versionString) process a version string, to determine the major and minor versions of the file.private PDFObjectreadArray(int objNum, int objGen, PDFDecrypter decrypter) read an [ array ].private PDFObjectreadDictionary(int objNum, int objGen, PDFDecrypter decrypter) read an entire << dictionary >>.private intread a character, and return its value as if it were a hexidecimal digit.private intreturn the 8-bit value represented by the next two hex characters.private PDFObjectreadHexString(int objNum, int objGen, PDFDecrypter decrypter) read a < hex string >.private PDFObjectreadKeyword(char start) read a bare keyword.private StringreadLine()Read a line of text.private PDFObjectreadLiteralString(int objNum, int objGen, PDFDecrypter decrypter) read a ( character string ).private PDFObjectreadName()read a /name.private intreadNum(byte[] sbuf, int pos, int numBytes) private PDFObjectreadNumber(char start) read a number.private PDFObjectreadObject(int objNum, int objGen, boolean numscan, PDFDecrypter decrypter) read the next object with a special catch for numbersprivate PDFObjectreadObject(int objNum, int objGen, PDFDecrypter decrypter) read the next object from the fileprivate PDFObjectreadObjectDescription(int objNum, int objGen, PDFDecrypter decrypter) read an entire PDFObject.private ByteBufferreadStream(PDFObject dict) read the stream portion of a PDFObject.private voidreadTrailer(PDFPassword password) read the cross reference table from a PDF file.private voidreadTrailer15(PDFPassword password) read the cross reference table from a PDF file.voidstop(int pageNum) Stop the rendering of a particular image on this page
-
Field Details
-
NUL_CHAR
public static final int NUL_CHAR- See Also:
-
FF_CHAR
public static final int FF_CHAR- See Also:
-
versionString
-
majorVersion
private int majorVersion -
minorVersion
private int minorVersion -
VERSION_COMMENT
the comment text to begin the file to determine it's version- See Also:
-
buf
ByteBuffer bufA ByteBuffer containing the file data -
objIdx
PDFXref[] objIdxthe cross reference table mapping object numbers to locations in the PDF file -
root
PDFObject rootthe root PDFObject, as specified in the PDF file -
encrypt
PDFObject encryptthe Encrypt PDFObject, from the trailer -
info
PDFObject infoThe Info PDFPbject, from the trailer, for simple metadata -
cache
Cache cachea mapping of page numbers to parsed PDF commands -
printable
private boolean printablewhether the file is printable or not (trailer -> Encrypt -> P & 0x4) -
saveable
private boolean saveablewhether the file is saveable or not (trailer -> Encrypt -> P & 0x10) -
defaultDecrypter
The default decrypter for streams and strings. By default, no encryption is expected, and thus the IdentityDecrypter is used.
-
-
Constructor Details
-
PDFFile
get a PDFFile from a .pdf file. The file must me a random access file at the moment. It should really be a file mapping from the nio package.Use the getPage(...) methods to get a page from the PDF file.
- Parameters:
buf- the RandomAccessFile containing the PDF.- Throws:
IOException- if there's a problem reading from the bufferPDFParseException- if the document appears to be malformed, or its features are unsupported. If the file is encrypted in a manner that the product or platform does not support then the exception'scausewill be an instance ofUnsupportedEncryptionException.PDFAuthenticationFailureException- if the file is password protected and requires a password
-
PDFFile
- Throws:
IOException
-
PDFFile
get a PDFFile from a .pdf file. The file must me a random access file at the moment. It should really be a file mapping from the nio package.Use the getPage(...) methods to get a page from the PDF file.
- Parameters:
buf- the RandomAccessFile containing the PDF.password- the user or owner password- Throws:
IOException- if there's a problem reading from the bufferPDFParseException- if the document appears to be malformed, or its features are unsupported. If the file is encrypted in a manner that the product or platform does not support then the exception'scausewill be an instance ofUnsupportedEncryptionException.PDFAuthenticationFailureException- if the file is password protected and the supplied password does not decrypt the document
-
-
Method Details
-
isPrintable
public boolean isPrintable()Gets whether the owner of the file has given permission to print the file.- Returns:
- true if it is okay to print the file
-
isSaveable
public boolean isSaveable()Gets whether the owner of the file has given permission to save a copy of the file.- Returns:
- true if it is okay to save the file
-
getRoot
get the root PDFObject of this PDFFile. You generally shouldn't need this, but we've left it open in case you want to go spelunking. -
getNumPages
public int getNumPages()return the number of pages in this PDFFile. The pages will be numbered from 1 to getNumPages(), inclusive. -
getStringMetadata
Get metadata (e.g., Author, Title, Creator) from the Info dictionary as a string.- Parameters:
name- the name of the metadata key (e.g., Author)- Returns:
- the info
- Throws:
IOException- if the metadata cannot be read
-
getMetadataKeys
Get the keys into the Info metadata, for use withgetStringMetadata(String)- Returns:
- the keys present into the Info dictionary
- Throws:
IOException- if the keys cannot be read
-
dereference
Used internally to track down PDFObject references. You should never need to call this.Since this is the only public method for tracking down PDF objects, it is synchronized. This means that the PDFFile can only hunt down one object at a time, preventing the file's location from getting messed around.
This call stores the current buffer position before any changes are made and restores it afterwards, so callers need not know that the position has changed.
- Throws:
IOException
-
isWhiteSpace
public static boolean isWhiteSpace(int c) Is the argument a white space character according to the PDF spec?. ISO Spec 32000-1:2008 - Table 1 -
isDelimiter
public static boolean isDelimiter(int c) Is the argument a delimiter according to the PDF spec?ISO 32000-1:2008 - Table 2
- Parameters:
c- the character to test
-
isRegularCharacter
public static boolean isRegularCharacter(int c) return true if the character is neither a whitespace or a delimiter.- Parameters:
c- the character to test- Returns:
- boolean
-
readObject
read the next object from the file- Parameters:
objNum- the object number of the object containing the object being read; negative only if the object number is unavailable (e.g., if reading from the trailer, or reading at the top level, in which case we can expect to be reading an object description)objGen- the object generation of the object containing the object being read; negative only if the objNum is unavailabledecrypter- the decrypter to use- Throws:
IOException
-
readObject
private PDFObject readObject(int objNum, int objGen, boolean numscan, PDFDecrypter decrypter) throws IOException read the next object with a special catch for numbers- Parameters:
numscan- if true, don't bother trying to see if a number is an object reference (used when already in the middle of testing for an object reference, and not otherwise)objNum- the object number of the object containing the object being read; negative only if the object number is unavailable (e.g., if reading from the trailer, or reading at the top level, in which case we can expect to be reading an object description)objGen- the object generation of the object containing the object being read; negative only if the objNum is unavailabledecrypter- the decrypter to use- Throws:
IOException
-
nextNonWhitespaceChar
Get the next non-white space character- Parameters:
buf- the buffer to read from- Returns:
- the next non-whitespace character
-
consumeWhitespace
Consume all sequential whitespace from the current buffer position, leaving the buffer positioned at non-whitespace- Parameters:
buf- the buffer to read from
-
nextItemIs
requires the next few characters (after whitespace) to match the argument.- Parameters:
match- the next few characters after any whitespace that must be in the file- Returns:
- true if the next characters match; false otherwise.
- Throws:
IOException
-
processVersion
process a version string, to determine the major and minor versions of the file.- Parameters:
versionString-
-
getMajorVersion
public int getMajorVersion()return the major version of the PDF header.- Returns:
- int
-
getMinorVersion
public int getMinorVersion()return the minor version of the PDF header.- Returns:
- int
-
getVersionString
return the version string from the PDF header.- Returns:
- String
-
readDictionary
read an entire << dictionary >>. The initial << has already been read.- Parameters:
objNum- the object number of the object containing the dictionary being read; negative only if the object number is unavailable, which should only happen if we're reading a dictionary placed directly in the trailerobjGen- the object generation of the object containing the object being read; negative only if the objNum is unavailabledecrypter- the decrypter to use- Returns:
- the Dictionary as a PDFObject.
- Throws:
IOException
-
readHexDigit
read a character, and return its value as if it were a hexidecimal digit.- Returns:
- a number between 0 and 15 whose value matches the next hexidecimal character. Returns -1 if the next character isn't in [0-9a-fA-F]
- Throws:
IOException
-
readHexPair
return the 8-bit value represented by the next two hex characters. If the next two characters don't represent a hex value, return -1 and reset the read head. If there is only one hex character, return its value as if there were an implicit 0 after it.- Throws:
IOException
-
readHexString
read a < hex string >. The initial < has already been read.- Parameters:
objNum- the object number of the object containing the dictionary being read; negative only if the object number is unavailable, which should only happen if we're reading a string placed directly in the trailerobjGen- the object generation of the object containing the object being read; negative only if the objNum is unavailabledecrypter- the decrypter to use- Throws:
IOException
-
readLiteralString
private PDFObject readLiteralString(int objNum, int objGen, PDFDecrypter decrypter) throws IOException read a ( character string ). The initial ( has already been read. Read until a *balanced* ) appears.
Section 3.2.3 of PDF Refernce version 1.7 defines the format of String objects. Regarding literal strings:
Within a literal string, the backslash (\) is used as an escape character for various purposes, such as to include newline characters, nonprinting ASCII characters, unbalanced parentheses, or the backslash character itself in the string. The character immediately following the backslash determines its precise interpretation (see Table 3.2). If the character following the backslash is not one of those shown in the table, the backslash is ignored.
*This only reads 8 bit basic character 'strings' so as to avoid a text string interpretation when one is not desired (e.g., for byte strings, as used by the decryption mechanism). For an interpretation of a string returned from this method, where the object type is defined as a 'text string' as per Section 3.8.1, Table 3.31 "PDF Data Types",
PDFStringUtil.asTextString(java.lang.String)()} orPDFObject.getTextStringValue()must be employed.- Parameters:
objNum- the object number of the object containing the dictionary being read; negative only if the object number is unavailable, which should only happen if we're reading a dictionary placed directly in the trailerobjGen- the object generation of the object containing the object being read; negative only if the objNum is unavailabledecrypter- the decrypter to use- Throws:
IOException
-
readLine
Read a line of text. This follows the semantics of readLine() in DataInput -- it reads character by character until a '\n' is encountered. If a '\r' is encountered, it is discarded. -
readArray
read an [ array ]. The initial [ has already been read. PDFObjects are read until ].- Parameters:
objNum- the object number of the object containing the dictionary being read; negative only if the object number is unavailable, which should only happen if we're reading an array placed directly in the trailerobjGen- the object generation of the object containing the object being read; negative only if the objNum is unavailabledecrypter- the decrypter to use- Throws:
IOException
-
readName
read a /name. The / has already been read.- Throws:
IOException
-
readNumber
read a number. The initial digit or . or - is passed in as the argument.- Throws:
IOException
-
readKeyword
read a bare keyword. The initial character is passed in as the argument.- Throws:
IOException
-
readObjectDescription
private PDFObject readObjectDescription(int objNum, int objGen, PDFDecrypter decrypter) throws IOException read an entire PDFObject. The intro line, which looks something like "4 0 obj" has already been read.- Parameters:
objNum- the object number of the object being read, being the first number in the intro line (4 in "4 0 obj")objGen- the object generation of the object being read, being the second number in the intro line (0 in "4 0 obj").decrypter- the decrypter to use- Throws:
IOException
-
readStream
read the stream portion of a PDFObject. Calls decodeStream to un-filter the stream as necessary.- Parameters:
dict- the dictionary associated with this stream.- Returns:
- a ByteBuffer with the encoded stream data
- Throws:
IOException
-
readTrailer
private void readTrailer(PDFPassword password) throws IOException, PDFAuthenticationFailureException, EncryptionUnsupportedByProductException, EncryptionUnsupportedByPlatformException read the cross reference table from a PDF file. When this method is called, the file pointer must point to the start of the word "xref" in the file. Reads the xref table and the trailer dictionary. If dictionary has a /Prev entry, move file pointer and read new trailer- Parameters:
password-- Throws:
IOExceptionPDFAuthenticationFailureExceptionEncryptionUnsupportedByProductExceptionEncryptionUnsupportedByPlatformException
-
readTrailer15
private void readTrailer15(PDFPassword password) throws IOException, PDFAuthenticationFailureException, EncryptionUnsupportedByProductException, EncryptionUnsupportedByPlatformException read the cross reference table from a PDF file. When this method is called, the file pointer must point to the start of the word "xref" in the file. Reads the xref table and the trailer dictionary. If dictionary has a /Prev entry, move file pointer and read new trailer- Parameters:
password-- Throws:
IOExceptionPDFAuthenticationFailureExceptionEncryptionUnsupportedByProductExceptionEncryptionUnsupportedByPlatformException
-
readNum
private int readNum(byte[] sbuf, int pos, int numBytes) -
parseFile
build the PDFFile reference table. Nothing in the PDFFile actually gets parsed, despite the name of this function. Things only get read and parsed when they're needed.- Parameters:
password-- Throws:
IOException
-
getOutline
Gets the outline tree as a tree of OutlineNode, which is a subclass of DefaultMutableTreeNode. If there is no outline tree, this method returns null.- Throws:
IOException
-
getPageNumber
Gets the page number (starting from 1) of the page represented by a particular PDFObject. The PDFObject must be a Page dictionary or a destination description (or an action).- Returns:
- a number between 1 and the number of pages indicating the page number, or 0 if the PDFObject is not in the page tree.
- Throws:
IOException
-
getPage
Get the page commands for a given page in a separate thread.- Parameters:
pagenum- the number of the page to get commands for
-
getPage
Get the page commands for a given page.- Parameters:
pagenum- the number of the page to get commands forwait- if true, do not exit until the page is complete.
-
stop
public void stop(int pageNum) Stop the rendering of a particular image on this page -
getContents
get the stream representing the content of a particular page.- Parameters:
pageObj- the page object to get the contents of- Returns:
- a concatenation of any content streams for the requested page.
- Throws:
IOException
-
createPage
Create a PDF Page object by finding the relevant inherited properties- Parameters:
pageObj- the PDF object for the page to be created- Throws:
IOException
-
findPage
private PDFObject findPage(PDFObject pagedict, int start, int getPage, Map<String, PDFObject> resources) throws IOExceptionGet the PDFObject representing the content of a particular page. Note that the number of the page need not have anything to do with the label on that page. If there are two blank pages, and then roman numerals for the page number, then passing in 6 will get page (iv).- Parameters:
pagedict- the top of the pages treestart- the page number of the first page in this dictionarygetPage- the number of the page to find; NOT the page's label.resources- a HashMap that will be filled with any resource definitions encountered on the search for the page- Throws:
IOException
-
getInheritedValue
Find a property value in a page that may be inherited. If the value is not defined in the page itself, follow the page's "parent" links until the value is found or the top of the tree is reached.- Parameters:
pageObj- the object representing the pagepropName- the name of the property we are looking for- Throws:
IOException
-
parseNormalisedRectangle
- Throws:
IOException
-
getDefaultDecrypter
Get the default decrypter for the document- Returns:
- the default decrypter; never null, even for documents that aren't encrypted
-