Class PDFParser

All Implemented Interfaces:
Runnable, Watchable

public class PDFParser extends BaseWatchable
PDFParser is the class that parses a PDF content stream and produces PDFCmds for a PDFPage. You should never ever see it run: it gets created by a PDFPage only if needed, and may even run in its own thread.
  • Field Details

    • mDebugCommandIndex

      private int mDebugCommandIndex
    • stack

      private Stack<Object> stack
    • parserStates

      private Stack<PDFParser.ParserState> parserStates
    • state

      private PDFParser.ParserState state
    • path

      private GeneralPath path
    • clip

      private int clip
    • loc

      private int loc
    • resend

      private boolean resend
    • tok

      private PDFParser.Tok tok
    • catchexceptions

      private boolean catchexceptions
    • pageRef

      private final WeakReference<PDFPage> pageRef
      a weak reference to the page we render into. For the page to remain available, some other code must retain a strong reference to it.
    • cmds

      private PDFPage cmds
      the actual command, for use within a singe iteration. Note that this must be released at the end of each iteration to assure the page can be collected if not in use
    • stream

      byte[] stream
    • resources

      HashMap<String, PDFObject> resources
    • errorwritten

      boolean errorwritten
    • autoAdjustStroke

      private boolean autoAdjustStroke
    • strokeOverprint

      private boolean strokeOverprint
    • strokeOverprintMode

      private int strokeOverprintMode
    • fillOverprint

      private boolean fillOverprint
    • fillOverprintMode

      private int fillOverprintMode
    • addAnnotation

      private boolean addAnnotation
  • Constructor Details

    • PDFParser

      public PDFParser(PDFPage cmds, byte[] stream, HashMap<String, PDFObject> resources)
      Don't call this constructor directly. Instead, use PDFFile.getPage(int pagenum) to get a PDFPage. There should never be any reason for a user to create, access, or hold on to a PDFParser.
  • Method Details

    • nextToken

      private PDFParser.Tok nextToken()
      get the next token.
    • readName

      private String readName()
      read a name (sequence of non-PDF-delimiting characters) from the stream.
    • readNum

      private double readNum()
      read a floating point number from the stream
    • readString

      private String readString()

      read a String from the stream. Strings begin with a '(' character, which has already been read, and end with a balanced ')' character. A '\' character starts an escape sequence of up to three octal digits.

      Parenthesis must be enclosed by a balanced set of parenthesis, so a string may enclose balanced parenthesis.

      Returns:
      the string with escape sequences replaced with their values
    • readByteArray

      private String readByteArray()
      read a byte array from the stream. Byte arrays begin with a 'invalid input: '<'' character, which has already been read, and end with a '>' character. Each byte in the array is made up of two hex characters, the first being the high-order bit. We translate the byte arrays into char arrays by combining two bytes into a character, and then translate the character array into a string. [JK FIXME this is probably a really bad idea!]
      Returns:
      the byte array
    • setup

      public void setup()
      Called to prepare for some iterations
      Overrides:
      setup in class BaseWatchable
    • iterate

      public int iterate() throws Exception
      parse the stream. commands are added to the PDFPage initialized in the constructor as they are encountered.

      Page numbers in comments refer to the Adobe PDF specification.
      commands are listed in PDF spec 32000-1:2008 in Table A.1

      Specified by:
      iterate in class BaseWatchable
      Returns:
      • Watchable.RUNNING when there are commands to be processed
      • Watchable.COMPLETED when the page is done and all the commands have been processed
      • Watchable.STOPPED if the page we are rendering into is no longer available
      Throws:
      Exception
    • tryClosingPath

      private void tryClosingPath()
      Try to close a path but don't fail with exception if this is not working. This is just a workaround for some PDFs with wrong content...
    • onNextObject

      private void onNextObject(PDFParser.Tok obj) throws PDFDebugger.DebugStopException
      Throws:
      PDFDebugger.DebugStopException
    • processQCmd

      private void processQCmd()
      abstracted command processing for Q command. Used directly and as part of processing of mushed QBT command.
    • processBTCmd

      private void processBTCmd()
      abstracted command processing for BT command. Used directly and as part of processing of mushed QBT command.
    • cleanup

      public void cleanup()
      Cleanup when iteration is done
      Overrides:
      cleanup in class BaseWatchable
    • findResource

      private PDFObject findResource(String name, String inDict) throws IOException
      get a property from a named dictionary in the resources of this content stream.
      Parameters:
      name - the name of the property in the dictionary
      inDict - the name of the dictionary in the resources
      Returns:
      the value of the property in the dictionary
      Throws:
      IOException
    • doXObject

      private void doXObject(PDFObject obj) throws IOException
      Insert a PDF object into the command stream. The object must either be an Image or a Form, which is a set of PDF commands in a stream.
      Parameters:
      obj - the object to insert, an Image or a Form.
      Throws:
      IOException
    • doImage

      private void doImage(PDFObject obj) throws IOException
      Parse image data into a Java BufferedImage and add the image command to the page.
      Parameters:
      obj - contains the image data, and a dictionary describing the width, height and color space of the image.
      Throws:
      IOException
    • doForm

      private void doForm(PDFObject obj) throws IOException
      Inject a stream of PDF commands onto the page. Optimized to cache a parsed stream of commands, so that each Form object only needs to be parsed once.
      Parameters:
      obj - a stream containing the PDF commands, a transformation matrix, bounding box, and resources.
      Throws:
      IOException
    • doPattern

      private PDFPaint doPattern(PatternSpace patternSpace) throws IOException
      Set the values into a PatternSpace
      Throws:
      IOException
    • parseObject

      Parse the next object out of the PDF stream. This could be a Double, a String, a HashMap (dictionary), Object[] array, or a Tok containing a PDF command.
      Throws:
      PDFParseException
      PDFDebugger.DebugStopException
    • parseInlineImage

      private void parseInlineImage() throws IOException, PDFDebugger.DebugStopException
      Parse an inline image. An inline image starts with BI (already read, contains a dictionary until ID, and then image data until EI.
      Throws:
      IOException
      PDFDebugger.DebugStopException
    • doShader

      private void doShader(PDFObject shaderObj) throws IOException
      build a shader from a dictionary.
      Throws:
      IOException
    • getFontFrom

      private PDFFont getFontFrom(String fontref) throws IOException
      get a PDFFont from the resources, given the resource name of the font.
      Parameters:
      fontref - the resource key for the font
      Throws:
      IOException
    • setGSState

      private void setGSState(String name) throws IOException
      add graphics state commands contained within a dictionary.
      Parameters:
      name - the resource name of the graphics state dictionary
      Throws:
      IOException
    • parseColorSpace

      private PDFColorSpace parseColorSpace(PDFObject csobj) throws IOException
      generate a PDFColorSpace description based on a PDFObject. The object could be a standard name, or the name of a resource in the ColorSpace dictionary, or a color space name with a defining dictionary or stream.
      Throws:
      IOException
    • popFloat

      private float popFloat() throws PDFParseException
      pop a single float value off the stack.
      Returns:
      the float value of the top of the stack
      Throws:
      PDFParseException - if the value on the top of the stack isn't a number
    • popFloat

      private float[] popFloat(int count) throws PDFParseException
      pop an array of float values off the stack. This is equivalent to filling an array from end to front by popping values off the stack.
      Parameters:
      count - the number of numbers to pop off the stack
      Returns:
      an array of length count
      Throws:
      PDFParseException - if any of the values popped off the stack are not numbers.
    • popInt

      private int popInt() throws PDFParseException
      pop a single integer value off the stack.
      Returns:
      the integer value of the top of the stack
      Throws:
      PDFParseException - if the top of the stack isn't a number.
    • popFloatArray

      private float[] popFloatArray() throws PDFParseException
      pop an array of integer values off the stack. This is equivalent to filling an array from end to front by popping values off the stack.
      Parameters:
      count - the number of numbers to pop off the stack
      Returns:
      an array of length count
      Throws:
      PDFParseException - if any of the values popped off the stack are not numbers.
    • popString

      private String popString() throws PDFParseException
      pop a String off the stack.
      Returns:
      the String from the top of the stack
      Throws:
      PDFParseException - if the top of the stack is not a NAME or STR.
    • popObject

      private PDFObject popObject() throws PDFParseException
      pop a PDFObject off the stack.
      Returns:
      the PDFObject from the top of the stack
      Throws:
      PDFParseException - if the top of the stack does not contain a PDFObject.
    • popArray

      private Object[] popArray() throws PDFParseException
      pop an array off the stack
      Returns:
      the array of objects that is the top element of the stack
      Throws:
      PDFParseException - if the top element of the stack does not contain an array.
    • setStatus

      protected void setStatus(int status)
      Description copied from class: BaseWatchable
      Set the status of this watchable
      Overrides:
      setStatus in class BaseWatchable