Package com.itextpdf.text.pdf.parser
Class TaggedPdfReaderTool
- java.lang.Object
-
- com.itextpdf.text.pdf.parser.TaggedPdfReaderTool
-
- Direct Known Subclasses:
CompareTool.CmpTaggedPdfReaderTool
public class TaggedPdfReaderTool extends java.lang.ObjectConverts a tagged PDF document into an XML file.- Since:
- 5.0.2
-
-
Constructor Summary
Constructors Constructor Description TaggedPdfReaderTool()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidconvertToXml(PdfReader reader, java.io.OutputStream os)Parses a string with structured content.voidconvertToXml(PdfReader reader, java.io.OutputStream os, java.lang.String charset)Parses a string with structured content.private static java.lang.StringfixTagName(java.lang.String tag)voidinspectChild(PdfObject k)Inspects a child of a structured element.voidinspectChildArray(PdfArray k)If the child of a structured element is an array, we need to loop over the elements.voidinspectChildDictionary(PdfDictionary k)If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.voidinspectChildDictionary(PdfDictionary k, boolean inspectAttributes)If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.voidparseTag(java.lang.String tag, PdfObject object, PdfDictionary page)Searches for a tag in a page.protected java.lang.StringxmlName(PdfName name)
-
-
-
Field Detail
-
reader
protected PdfReader reader
The reader object from which the content streams are read.
-
out
protected java.io.PrintWriter out
The writer object to which the XML will be written
-
-
Method Detail
-
convertToXml
public void convertToXml(PdfReader reader, java.io.OutputStream os, java.lang.String charset) throws java.io.IOException
Parses a string with structured content.- Parameters:
reader- the PdfReader that has access to the PDF fileos- the OutputStream to which the resulting xml will be writtencharset- the charset to encode the data- Throws:
java.io.IOException- Since:
- 5.0.5
-
convertToXml
public void convertToXml(PdfReader reader, java.io.OutputStream os) throws java.io.IOException
Parses a string with structured content. The output is done using the current charset.- Parameters:
reader- the PdfReader that has access to the PDF fileos- the OutputStream to which the resulting xml will be written- Throws:
java.io.IOException
-
inspectChild
public void inspectChild(PdfObject k) throws java.io.IOException
Inspects a child of a structured element. This can be an array or a dictionary.- Parameters:
k- the child to inspect- Throws:
java.io.IOException
-
inspectChildArray
public void inspectChildArray(PdfArray k) throws java.io.IOException
If the child of a structured element is an array, we need to loop over the elements.- Parameters:
k- the child array to inspect- Throws:
java.io.IOException
-
inspectChildDictionary
public void inspectChildDictionary(PdfDictionary k) throws java.io.IOException
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.- Parameters:
k- the child dictionary to inspect- Throws:
java.io.IOException
-
inspectChildDictionary
public void inspectChildDictionary(PdfDictionary k, boolean inspectAttributes) throws java.io.IOException
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.- Parameters:
k- the child dictionary to inspect- Throws:
java.io.IOException
-
xmlName
protected java.lang.String xmlName(PdfName name)
-
fixTagName
private static java.lang.String fixTagName(java.lang.String tag)
-
parseTag
public void parseTag(java.lang.String tag, PdfObject object, PdfDictionary page) throws java.io.IOExceptionSearches for a tag in a page.- Parameters:
tag- the name of the tagobject- an identifier to find the marked contentpage- a page dictionary- Throws:
java.io.IOException
-
-