Package com.itextpdf.kernel.utils
Class TaggedPdfReaderTool
- java.lang.Object
-
- com.itextpdf.kernel.utils.TaggedPdfReaderTool
-
public class TaggedPdfReaderTool extends java.lang.ObjectConverts a tagged PDF document into an XML file.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private classTaggedPdfReaderTool.MarkedContentEventListener
-
Field Summary
Fields Modifier and Type Field Description protected PdfDocumentdocumentprivate java.util.Set<PdfObject>inspectedStructTreeElemsprotected java.io.OutputStreamWriteroutprotected java.util.Map<PdfDictionary,java.util.Map<java.lang.Integer,java.lang.String>>parsedTagsprotected java.lang.StringrootTag
-
Constructor Summary
Constructors Constructor Description TaggedPdfReaderTool(PdfDocument document)Constructs aTaggedPdfReaderToolvia a givenPdfDocument.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidconvertToXml(java.io.OutputStream os)Converts the current tag structure into an XML file with default encoding (UTF-8).voidconvertToXml(java.io.OutputStream os, java.lang.String charset)Converts the current tag structure into an XML file with provided encoding.protected static java.lang.StringescapeXML(java.lang.String s, boolean onlyASCII)NOTE: copied from itext5 XMLUtils class Escapes a string with the appropriated XML codes.protected static java.lang.StringfixTagName(java.lang.String tag)Fixes specified tag name to be valid XML tag.protected voidinspectAttributes(PdfStructElem kid)Inspects attributes dictionary of the StructTreeRoot child.protected voidinspectKid(IStructureNode kid)Inspect the child of the StructTreeRoot.protected voidinspectKids(java.util.List<IStructureNode> kids)Inspect the children of the StructTreeRoot.static booleanisValidCharacterValue(int c)Checks if a character value should be escaped/unescaped.protected voidparseTag(PdfMcr kid)Parses tag of the Marked Content Reference (MCR) kid of the StructTreeRoot.TaggedPdfReaderToolsetRootTag(java.lang.String rootTagName)Sets the name of the root tag of the resultant XML file
-
-
-
Field Detail
-
document
protected PdfDocument document
-
out
protected java.io.OutputStreamWriter out
-
rootTag
protected java.lang.String rootTag
-
parsedTags
protected java.util.Map<PdfDictionary,java.util.Map<java.lang.Integer,java.lang.String>> parsedTags
-
inspectedStructTreeElems
private final java.util.Set<PdfObject> inspectedStructTreeElems
-
-
Constructor Detail
-
TaggedPdfReaderTool
public TaggedPdfReaderTool(PdfDocument document)
Constructs aTaggedPdfReaderToolvia a givenPdfDocument.- Parameters:
document- the document to read tag structure from
-
-
Method Detail
-
isValidCharacterValue
public static boolean isValidCharacterValue(int c)
Checks if a character value should be escaped/unescaped.- Parameters:
c- a character value- Returns:
- true if it's OK to escape or unescape this value.
-
convertToXml
public void convertToXml(java.io.OutputStream os) throws java.io.IOExceptionConverts the current tag structure into an XML file with default encoding (UTF-8).- Parameters:
os- the output stream to save XML file to- Throws:
java.io.IOException- in case of any I/O error
-
convertToXml
public void convertToXml(java.io.OutputStream os, java.lang.String charset) throws java.io.IOExceptionConverts the current tag structure into an XML file with provided encoding.- Parameters:
os- the output stream to save XML file tocharset- the charset of the resultant XML file- Throws:
java.io.IOException- in case of any I/O error
-
setRootTag
public TaggedPdfReaderTool setRootTag(java.lang.String rootTagName)
Sets the name of the root tag of the resultant XML file- Parameters:
rootTagName- the name of the root tag- Returns:
- this object
-
inspectKids
protected void inspectKids(java.util.List<IStructureNode> kids)
Inspect the children of the StructTreeRoot.- Parameters:
kids- list of the direct kids of the StructTreeRoot
-
inspectKid
protected void inspectKid(IStructureNode kid)
Inspect the child of the StructTreeRoot.- Parameters:
kid- the direct kid of the StructTreeRoot
-
inspectAttributes
protected void inspectAttributes(PdfStructElem kid)
Inspects attributes dictionary of the StructTreeRoot child.- Parameters:
kid- the direct kid of the StructTreeRoot
-
parseTag
protected void parseTag(PdfMcr kid)
Parses tag of the Marked Content Reference (MCR) kid of the StructTreeRoot.- Parameters:
kid- the directPdfMcrkid of the StructTreeRoot
-
fixTagName
protected static java.lang.String fixTagName(java.lang.String tag)
Fixes specified tag name to be valid XML tag.- Parameters:
tag- tag name to fix- Returns:
- fixed tag name.
-
escapeXML
protected static java.lang.String escapeXML(java.lang.String s, boolean onlyASCII)NOTE: copied from itext5 XMLUtils class Escapes a string with the appropriated XML codes.- Parameters:
s- the string to be escapedonlyASCII- codes above 127 will always be escaped with &#nn; iftrue- Returns:
- the escaped string
-
-