Package org.apache.pdfbox.pdfparser
Class PDFStreamParser
java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.PDFStreamParser
This will parse a PDF byte stream and extract operands and such.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final byte[]private intprivate longprivate static final org.apache.commons.logging.LogLog instance.private static final intprivate static final PatternFields inherited from class org.apache.pdfbox.pdfparser.BaseParser
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, MAX_LENGTH_LONG, N, O, R, S, source, STREAM_STRING, T -
Constructor Summary
ConstructorsConstructorDescriptionPDFStreamParser(byte[] bytes) Constructor.PDFStreamParser(PDContentStream pdContentstream) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Close the underlying resource.private booleanChecks if the next char is a space or a return.private booleanLooks up an amount of bytes if they contain only ASCII characters (no control sequences etc.), and that these ASCII characters begin with a sequence of 1-3 non-blank characters between blanksprivate booleanisSpaceOrReturn(int c) parse()This will parse all the tokens in the stream.This will parse the next token in the stream.private StringThis will read an operator from the stream.Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
getObjectKey, isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOF, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSString, parseDirObject, readExpectedChar, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipLinebreak, skipSpaces, skipWhiteSpaces
-
Field Details
-
LOG
private static final org.apache.commons.logging.Log LOGLog instance. -
NUMBER_PATTERN
-
MAX_BIN_CHAR_TEST_LENGTH
private static final int MAX_BIN_CHAR_TEST_LENGTH- See Also:
-
binCharTestArr
private final byte[] binCharTestArr -
inlineImageDepth
private int inlineImageDepth -
inlineOffset
private long inlineOffset
-
-
Constructor Details
-
PDFStreamParser
Constructor.- Parameters:
pdContentstream- The content stream to parse.- Throws:
IOException- If there is an error initializing the stream.
-
PDFStreamParser
public PDFStreamParser(byte[] bytes) Constructor.- Parameters:
bytes- the bytes to parse.
-
-
Method Details
-
parse
This will parse all the tokens in the stream. This will close the stream when it is finished parsing.- Returns:
- All of the tokens in the stream.
- Throws:
IOException- If there is an error while parsing the stream.
-
parseNextToken
This will parse the next token in the stream.- Returns:
- The next token in the stream or null if there are no more tokens in the stream.
- Throws:
IOException- If an io error occurs while parsing the stream.
-
hasNoFollowingBinData
Looks up an amount of bytes if they contain only ASCII characters (no control sequences etc.), and that these ASCII characters begin with a sequence of 1-3 non-blank characters between blanks- Returns:
trueif next bytes are probably printable ASCII characters starting with a PDF operator, otherwisefalse- Throws:
IOException
-
readOperator
This will read an operator from the stream.- Returns:
- The operator that was read from the stream.
- Throws:
IOException- If there is an error reading from the stream.
-
isSpaceOrReturn
private boolean isSpaceOrReturn(int c) -
hasNextSpaceOrReturn
Checks if the next char is a space or a return.- Returns:
- true if the next char is a space or a return
- Throws:
IOException- if something went wrong
-
close
Close the underlying resource.- Throws:
IOException- if something went wrong
-