Class BruteForceParser

All Implemented Interfaces:
ICOSParser

public class BruteForceParser extends COSParser
Brute force parser to be used as last resort if a malformed pdf can't be read.
  • Field Details

    • XREF_TABLE

      private static final char[] XREF_TABLE
    • XREF_STREAM

      private static final char[] XREF_STREAM
    • MINIMUM_SEARCH_OFFSET

      private static final long MINIMUM_SEARCH_OFFSET
      See Also:
    • EOF_MARKER

      private static final char[] EOF_MARKER
      EOF-marker.
    • OBJ_MARKER

      private static final char[] OBJ_MARKER
      obj-marker.
    • TRAILER_MARKER

      private static final char[] TRAILER_MARKER
      trailer-marker.
    • OBJ_STREAM

      private static final char[] OBJ_STREAM
      ObjStream-marker.
    • LOG

      private static final org.apache.commons.logging.Log LOG
    • bfSearchCOSObjectKeyOffsets

      private final Map<COSObjectKey,Long> bfSearchCOSObjectKeyOffsets
      Contains all found objects of a brute force search.
    • bfSearchTriggered

      private boolean bfSearchTriggered
  • Constructor Details

    • BruteForceParser

      public BruteForceParser(RandomAccessRead source, COSDocument document) throws IOException
      Constructor. Triggers a brute force search for all objects of the document.
      Parameters:
      source - input representing the pdf.
      document - the corresponding COS document
      Throws:
      IOException - if the source data could not be read
  • Method Details

    • bfSearchTriggered

      public boolean bfSearchTriggered()
      Indicates wether the brute force search for objects was triggered.
      Returns:
      true if the search was triggered
    • getBFCOSObjectOffsets

      protected Map<COSObjectKey,Long> getBFCOSObjectOffsets() throws IOException
      Returns all found objects of a brute force search.
      Returns:
      map containing all found objects of a brute force search
      Throws:
      IOException - if something went wrong
    • bfSearchForObjects

      private void bfSearchForObjects() throws IOException
      Brute force search for every object in the pdf.
      Throws:
      IOException - if something went wrong
    • bfSearchForXRef

      protected long bfSearchForXRef(long xrefOffset) throws IOException
      Search for the offset of the given xref table/stream among those found by a brute force search.
      Parameters:
      xrefOffset - the given offset to be searched for
      Returns:
      the offset of the xref entry
      Throws:
      IOException - if something went wrong
    • searchNearestValue

      private long searchNearestValue(List<Long> values, long offset)
    • bfSearchForObjStreams

      protected void bfSearchForObjStreams(XrefTrailerResolver trailerResolver, SecurityHandler<? extends ProtectionPolicy> securityHandler) throws IOException
      Brute force search for all objects streams of a pdf.
      Parameters:
      trailerResolver - the trailer resolver of the document
      securityHandler - security handler to be used to decrypt encrypted documents
      Throws:
      IOException - if something went wrong
    • bfSearchForTrailer

      private boolean bfSearchForTrailer(COSDictionary trailer) throws IOException
      Brute force search for all trailer marker.
      Parameters:
      trailer - dictionary to be used as trailer dictionary
      Throws:
      IOException - if something went wrong
    • searchForTrailerItems

      private boolean searchForTrailerItems(COSDictionary trailer) throws IOException
      Search for the different parts of the trailer dictionary.
      Parameters:
      trailer - dictionary to be used as trailer dictionary
      Returns:
      true if the root was found, false if not.
      Throws:
      IOException - if something went wrong
    • compareCOSObjects

      private COSObject compareCOSObjects(COSObject newObject, Long newOffset, COSObject currentObject)
    • bfSearchForLastEOFMarker

      private long bfSearchForLastEOFMarker() throws IOException
      Brute force search for the last EOF marker.
      Throws:
      IOException - if something went wrong
    • bfSearchForObjStreamOffsets

      private Map<Long,COSObjectKey> bfSearchForObjStreamOffsets() throws IOException
      Search for all offsets of object streams within the given pdf
      Returns:
      a map of all offsets for object streams
      Throws:
      IOException - if something went wrong
    • bfSearchForXRefTables

      private List<Long> bfSearchForXRefTables() throws IOException
      Brute force search for all xref entries (tables).
      Throws:
      IOException - if something went wrong
    • bfSearchForXRefStreams

      private List<Long> bfSearchForXRefStreams() throws IOException
      Brute force search for all /XRef entries (streams).
      Throws:
      IOException - if something went wrong
    • isInfo

      private boolean isInfo(COSDictionary dictionary)
      Tell if the dictionary is an info dictionary.
      Parameters:
      dictionary - the dictionary to be checked
      Returns:
      true if the given dictionary is an info dictionary
    • isCatalog

      private boolean isCatalog(COSDictionary dictionary)
      Tell if the dictionary is a PDF or FDF catalog.
      Parameters:
      dictionary -
      Returns:
      true if the given dictionary is a root dictionary
    • findString

      private long findString(char[] string) throws IOException
      Search for the given string. The search starts at the current position and returns the start position if the string was found. -1 is returned if there isn't any further occurrence of the given string. After returning the current position is either the end of the string or the end of the input.
      Parameters:
      string - the string to be searched
      Returns:
      the start position of the found string
      Throws:
      IOException - if something went wrong
    • rebuildTrailer

      protected COSDictionary rebuildTrailer(XrefTrailerResolver trailerResolver, SecurityHandler<? extends ProtectionPolicy> securityHandler) throws IOException
      Rebuild the trailer dictionary if startxref can't be found.
      Parameters:
      trailerResolver - the trailer resolver of the document
      securityHandler - security handler to be used to decrypt encrypted documents
      Returns:
      the rebuild trailer dictionary
      Throws:
      IOException - if something went wrong