Package com.itextpdf.text.pdf.parser
Class SimpleTextExtractionStrategy
java.lang.Object
com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy
- All Implemented Interfaces:
RenderListener,TextExtractionStrategy
A simple text extraction renderer.
This renderer keeps track of the current Y position of each string. If it detects
that the y position has changed, it inserts a line break into the output. If the
PDF renders text in a non-top-to-bottom fashion, this will result in the text not
being a true representation of how it appears in the PDF.
This renderer also uses a simple strategy based on the font metrics to determine if
a blank space should be inserted into the output.
- Since:
- 2.1.5
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected final voidappendTextChunk(CharSequence text) Used to actually append text to the text results.voidCalled when a new text block is beginning (i.e.voidCalled when a text block has ended (i.e.Returns the result so far.voidrenderImage(ImageRenderInfo renderInfo) no-op method - this renderer isn't interested in image eventsvoidrenderText(TextRenderInfo renderInfo) Captures text using a simplified algorithm for inserting hard returns and spaces
-
Field Details
-
lastStart
-
lastEnd
-
result
used to store the resulting String.
-
-
Constructor Details
-
SimpleTextExtractionStrategy
public SimpleTextExtractionStrategy()Creates a new text extraction renderer.
-
-
Method Details
-
beginTextBlock
public void beginTextBlock()Description copied from interface:RenderListenerCalled when a new text block is beginning (i.e. BT)- Specified by:
beginTextBlockin interfaceRenderListener- Since:
- 5.0.1
-
endTextBlock
public void endTextBlock()Description copied from interface:RenderListenerCalled when a text block has ended (i.e. ET)- Specified by:
endTextBlockin interfaceRenderListener- Since:
- 5.0.1
-
getResultantText
Returns the result so far.- Specified by:
getResultantTextin interfaceTextExtractionStrategy- Returns:
- a String with the resulting text.
-
appendTextChunk
Used to actually append text to the text results. Subclasses can use this to insert text that wouldn't normally be included in text parsing (e.g. result of OCR performed against image content)- Parameters:
text- the text to append to the text results accumulated so far
-
renderText
Captures text using a simplified algorithm for inserting hard returns and spaces- Specified by:
renderTextin interfaceRenderListener- Parameters:
renderInfo- render info
-
renderImage
no-op method - this renderer isn't interested in image events- Specified by:
renderImagein interfaceRenderListener- Parameters:
renderInfo- information specifying what to render- Since:
- 5.0.1
- See Also:
-