Package com.itextpdf.text.pdf.parser
Class TaggedPdfReaderTool
java.lang.Object
com.itextpdf.text.pdf.parser.TaggedPdfReaderTool
- Direct Known Subclasses:
CompareTool.CmpTaggedPdfReaderTool
Converts a tagged PDF document into an XML file.
- Since:
- 5.0.2
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected PrintWriterThe writer object to which the XML will be writtenprotected PdfReaderThe reader object from which the content streams are read. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidconvertToXml(PdfReader reader, OutputStream os) Parses a string with structured content.voidconvertToXml(PdfReader reader, OutputStream os, String charset) Parses a string with structured content.private static StringfixTagName(String tag) voidInspects a child of a structured element.voidIf the child of a structured element is an array, we need to loop over the elements.voidIf the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.voidinspectChildDictionary(PdfDictionary k, boolean inspectAttributes) If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.voidparseTag(String tag, PdfObject object, PdfDictionary page) Searches for a tag in a page.protected String
-
Field Details
-
reader
The reader object from which the content streams are read. -
out
The writer object to which the XML will be written
-
-
Constructor Details
-
TaggedPdfReaderTool
public TaggedPdfReaderTool()
-
-
Method Details
-
convertToXml
Parses a string with structured content.- Parameters:
reader- the PdfReader that has access to the PDF fileos- the OutputStream to which the resulting xml will be writtencharset- the charset to encode the data- Throws:
IOException- Since:
- 5.0.5
-
convertToXml
Parses a string with structured content. The output is done using the current charset.- Parameters:
reader- the PdfReader that has access to the PDF fileos- the OutputStream to which the resulting xml will be written- Throws:
IOException
-
inspectChild
Inspects a child of a structured element. This can be an array or a dictionary.- Parameters:
k- the child to inspect- Throws:
IOException
-
inspectChildArray
If the child of a structured element is an array, we need to loop over the elements.- Parameters:
k- the child array to inspect- Throws:
IOException
-
inspectChildDictionary
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.- Parameters:
k- the child dictionary to inspect- Throws:
IOException
-
inspectChildDictionary
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.- Parameters:
k- the child dictionary to inspect- Throws:
IOException
-
xmlName
-
fixTagName
-
parseTag
Searches for a tag in a page.- Parameters:
tag- the name of the tagobject- an identifier to find the marked contentpage- a page dictionary- Throws:
IOException
-