Package org.apache.pdfbox.pdfparser
Class BruteForceParser
java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.COSParser
org.apache.pdfbox.pdfparser.BruteForceParser
- All Implemented Interfaces:
ICOSParser
Brute force parser to be used as last resort if a malformed pdf can't be read.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final Map<COSObjectKey, Long> Contains all found objects of a brute force search.private booleanprivate static final char[]EOF-marker.private static final org.apache.commons.logging.Logprivate static final longprivate static final char[]obj-marker.private static final char[]ObjStream-marker.private static final char[]trailer-marker.private static final char[]private static final char[]Fields inherited from class org.apache.pdfbox.pdfparser.COSParser
fileLen, initialParseDone, securityHandler, SYSPROP_EOFLOOKUPRANGE, xrefTrailerResolverFields inherited from class org.apache.pdfbox.pdfparser.BaseParser
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, MAX_LENGTH_LONG, N, O, R, S, source, STREAM_STRING, T -
Constructor Summary
ConstructorsConstructorDescriptionBruteForceParser(RandomAccessRead source, COSDocument document) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionprivate longBrute force search for the last EOF marker.private voidBrute force search for every object in the pdf.private Map<Long, COSObjectKey> Search for all offsets of object streams within the given pdfprotected voidbfSearchForObjStreams(XrefTrailerResolver trailerResolver, SecurityHandler<? extends ProtectionPolicy> securityHandler) Brute force search for all objects streams of a pdf.private booleanbfSearchForTrailer(COSDictionary trailer) Brute force search for all trailer marker.protected longbfSearchForXRef(long xrefOffset) Search for the offset of the given xref table/stream among those found by a brute force search.Brute force search for all /XRef entries (streams).Brute force search for all xref entries (tables).booleanIndicates whether the brute force search for objects was triggered.private COSObjectcompareCOSObjects(COSObject newObject, Long newOffset, COSObject currentObject) private longfindString(char[] string) Search for the given string.protected Map<COSObjectKey, Long> Returns all found objects of a brute force search.private booleanisCatalog(COSDictionary dictionary) Tell if the dictionary is a PDF or FDF catalog.private booleanisInfo(COSDictionary dictionary) Tell if the dictionary is an info dictionary.protected COSDictionaryrebuildTrailer(XrefTrailerResolver trailerResolver, SecurityHandler<? extends ProtectionPolicy> securityHandler) Rebuild the trailer dictionary if startxref can't be found.private booleansearchForTrailerItems(COSDictionary trailer) Search for the different parts of the trailer dictionary.private longsearchNearestValue(List<Long> values, long offset) Methods inherited from class org.apache.pdfbox.pdfparser.COSParser
checkPages, createRandomAccessReadView, dereferenceCOSObject, getAccessPermission, getEncryption, isLenient, isString, lastIndexOf, parseCOSStream, parseFDFHeader, parseObjectDynamically, parseObjectStreamObject, parsePDFHeader, parseXrefTable, prepareDecryption, resetTrailerResolver, retrieveTrailer, setEOFLookupRange, setLenientMethods inherited from class org.apache.pdfbox.pdfparser.BaseParser
getObjectKey, isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOF, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSString, parseDirObject, readExpectedChar, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipLinebreak, skipSpaces, skipWhiteSpaces
-
Field Details
-
XREF_TABLE
private static final char[] XREF_TABLE -
XREF_STREAM
private static final char[] XREF_STREAM -
MINIMUM_SEARCH_OFFSET
private static final long MINIMUM_SEARCH_OFFSET- See Also:
-
EOF_MARKER
private static final char[] EOF_MARKEREOF-marker. -
OBJ_MARKER
private static final char[] OBJ_MARKERobj-marker. -
TRAILER_MARKER
private static final char[] TRAILER_MARKERtrailer-marker. -
OBJ_STREAM
private static final char[] OBJ_STREAMObjStream-marker. -
LOG
private static final org.apache.commons.logging.Log LOG -
bfSearchCOSObjectKeyOffsets
Contains all found objects of a brute force search. -
bfSearchTriggered
private boolean bfSearchTriggered
-
-
Constructor Details
-
BruteForceParser
Constructor. Triggers a brute force search for all objects of the document.- Parameters:
source- input representing the pdf.document- the corresponding COS document- Throws:
IOException- if the source data could not be read
-
-
Method Details
-
bfSearchTriggered
public boolean bfSearchTriggered()Indicates whether the brute force search for objects was triggered.- Returns:
- true if the search was triggered
-
getBFCOSObjectOffsets
Returns all found objects of a brute force search.- Returns:
- map containing all found objects of a brute force search
- Throws:
IOException- if something went wrong
-
bfSearchForObjects
Brute force search for every object in the pdf.- Throws:
IOException- if something went wrong
-
bfSearchForXRef
Search for the offset of the given xref table/stream among those found by a brute force search.- Parameters:
xrefOffset- the given offset to be searched for- Returns:
- the offset of the xref entry
- Throws:
IOException- if something went wrong
-
searchNearestValue
-
bfSearchForObjStreams
protected void bfSearchForObjStreams(XrefTrailerResolver trailerResolver, SecurityHandler<? extends ProtectionPolicy> securityHandler) throws IOException Brute force search for all objects streams of a pdf.- Parameters:
trailerResolver- the trailer resolver of the documentsecurityHandler- security handler to be used to decrypt encrypted documents- Throws:
IOException- if something went wrong
-
bfSearchForTrailer
Brute force search for all trailer marker.- Parameters:
trailer- dictionary to be used as trailer dictionary- Throws:
IOException- if something went wrong
-
searchForTrailerItems
Search for the different parts of the trailer dictionary.- Parameters:
trailer- dictionary to be used as trailer dictionary- Returns:
- true if the root was found, false if not.
- Throws:
IOException- if something went wrong
-
compareCOSObjects
-
bfSearchForLastEOFMarker
Brute force search for the last EOF marker.- Throws:
IOException- if something went wrong
-
bfSearchForObjStreamOffsets
Search for all offsets of object streams within the given pdf- Returns:
- a map of all offsets for object streams
- Throws:
IOException- if something went wrong
-
bfSearchForXRefTables
Brute force search for all xref entries (tables).- Throws:
IOException- if something went wrong
-
bfSearchForXRefStreams
Brute force search for all /XRef entries (streams).- Throws:
IOException- if something went wrong
-
isInfo
Tell if the dictionary is an info dictionary.- Parameters:
dictionary- the dictionary to be checked- Returns:
- true if the given dictionary is an info dictionary
-
isCatalog
Tell if the dictionary is a PDF or FDF catalog.- Parameters:
dictionary-- Returns:
- true if the given dictionary is a root dictionary
-
findString
Search for the given string. The search starts at the current position and returns the start position if the string was found. -1 is returned if there isn't any further occurrence of the given string. After returning the current position is either the end of the string or the end of the input.- Parameters:
string- the string to be searched- Returns:
- the start position of the found string
- Throws:
IOException- if something went wrong
-
rebuildTrailer
protected COSDictionary rebuildTrailer(XrefTrailerResolver trailerResolver, SecurityHandler<? extends ProtectionPolicy> securityHandler) throws IOException Rebuild the trailer dictionary if startxref can't be found.- Parameters:
trailerResolver- the trailer resolver of the documentsecurityHandler- security handler to be used to decrypt encrypted documents- Returns:
- the rebuild trailer dictionary
- Throws:
IOException- if something went wrong
-