Class PDFStreamParser

java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.PDFStreamParser

public class PDFStreamParser extends BaseParser
This will parse a PDF byte stream and extract operands and such.
  • Field Details

    • LOG

      private static final org.apache.commons.logging.Log LOG
      Log instance.
    • NUMBER_PATTERN

      private static final Pattern NUMBER_PATTERN
    • MAX_BIN_CHAR_TEST_LENGTH

      private static final int MAX_BIN_CHAR_TEST_LENGTH
      See Also:
    • binCharTestArr

      private final byte[] binCharTestArr
    • inlineImageDepth

      private int inlineImageDepth
    • inlineOffset

      private long inlineOffset
  • Constructor Details

    • PDFStreamParser

      public PDFStreamParser(PDContentStream pdContentstream) throws IOException
      Constructor.
      Parameters:
      pdContentstream - The content stream to parse.
      Throws:
      IOException - If there is an error initializing the stream.
    • PDFStreamParser

      public PDFStreamParser(byte[] bytes)
      Constructor.
      Parameters:
      bytes - the bytes to parse.
  • Method Details

    • parse

      public List<Object> parse() throws IOException
      This will parse all the tokens in the stream. This will close the stream when it is finished parsing.
      Returns:
      All of the tokens in the stream.
      Throws:
      IOException - If there is an error while parsing the stream.
    • parseNextToken

      public Object parseNextToken() throws IOException
      This will parse the next token in the stream.
      Returns:
      The next token in the stream or null if there are no more tokens in the stream.
      Throws:
      IOException - If an io error occurs while parsing the stream.
    • hasNoFollowingBinData

      private boolean hasNoFollowingBinData() throws IOException
      Looks up an amount of bytes if they contain only ASCII characters (no control sequences etc.), and that these ASCII characters begin with a sequence of 1-3 non-blank characters between blanks
      Returns:
      true if next bytes are probably printable ASCII characters starting with a PDF operator, otherwise false
      Throws:
      IOException
    • readOperator

      private String readOperator() throws IOException
      This will read an operator from the stream.
      Returns:
      The operator that was read from the stream.
      Throws:
      IOException - If there is an error reading from the stream.
    • isSpaceOrReturn

      private boolean isSpaceOrReturn(int c)
    • hasNextSpaceOrReturn

      private boolean hasNextSpaceOrReturn() throws IOException
      Checks if the next char is a space or a return.
      Returns:
      true if the next char is a space or a return
      Throws:
      IOException - if something went wrong
    • close

      public void close() throws IOException
      Close the underlying resource.
      Throws:
      IOException - if something went wrong