Class LocationTextExtractionStrategy
- java.lang.Object
-
- com.itextpdf.kernel.pdf.canvas.parser.listener.LocationTextExtractionStrategy
-
- All Implemented Interfaces:
IEventListener,ITextExtractionStrategy
public class LocationTextExtractionStrategy extends java.lang.Object implements ITextExtractionStrategy
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interfaceLocationTextExtractionStrategy.ITextChunkLocationStrategyprivate static classLocationTextExtractionStrategy.ITextChunkLocationStrategyImplprivate static classLocationTextExtractionStrategy.TextChunkMarks
-
Field Summary
Fields Modifier and Type Field Description private static booleanDUMP_STATEset to true for debuggingprivate TextRenderInfolastTextRenderInfoprivate java.util.List<TextChunk>locationalResulta summary of all found textprivate booleanrightToLeftRunDirectionprivate LocationTextExtractionStrategy.ITextChunkLocationStrategytclStratprivate booleanuseActualText
-
Constructor Summary
Constructors Constructor Description LocationTextExtractionStrategy()Creates a new text extraction renderer.LocationTextExtractionStrategy(LocationTextExtractionStrategy.ITextChunkLocationStrategy strat)Creates a new text extraction renderer, with a custom strategy for creating new TextChunkLocation objects based on the input of the TextRenderInfo.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private voiddumpState()Used for debugging onlyprivate booleanendsWithSpace(java.lang.String str)Checks if the string ends with a space character, false if the string is empty or ends with a non-space charactervoideventOccurred(IEventData data, EventType type)Called when some event occurs during parsing a content stream.private CanvasTagfindLastTagWithActualText(java.util.List<CanvasTag> canvasTagHierarchy)java.lang.StringgetResultantText()Returns the text that has been processed so far.java.util.Set<EventType>getSupportedEvents()Provides the set of event types this listener supports.protected booleanisChunkAtWordBoundary(TextChunk chunk, TextChunk previousChunk)Determines if a space character should be inserted between a previous chunk and the current chunk.booleanisUseActualText()Gets the value of the property which determines if /ActualText will be used when extracting the textLocationTextExtractionStrategysetRightToLeftRunDirection(boolean rightToLeftRunDirection)Sets if text flows from left to right or from right to left.LocationTextExtractionStrategysetUseActualText(boolean useActualText)Changes the behavior of text extraction so that if the parameter is set totrue, /ActualText marked content property will be used instead of raw decoded bytes.private voidsortWithMarks(java.util.List<TextChunk> textChunks)private booleanstartsWithSpace(java.lang.String str)Checks if the string starts with a space character, false if the string is empty or starts with a non-space character.
-
-
-
Field Detail
-
DUMP_STATE
private static boolean DUMP_STATE
set to true for debugging
-
locationalResult
private final java.util.List<TextChunk> locationalResult
a summary of all found text
-
tclStrat
private final LocationTextExtractionStrategy.ITextChunkLocationStrategy tclStrat
-
useActualText
private boolean useActualText
-
rightToLeftRunDirection
private boolean rightToLeftRunDirection
-
lastTextRenderInfo
private TextRenderInfo lastTextRenderInfo
-
-
Constructor Detail
-
LocationTextExtractionStrategy
public LocationTextExtractionStrategy()
Creates a new text extraction renderer.
-
LocationTextExtractionStrategy
public LocationTextExtractionStrategy(LocationTextExtractionStrategy.ITextChunkLocationStrategy strat)
Creates a new text extraction renderer, with a custom strategy for creating new TextChunkLocation objects based on the input of the TextRenderInfo.- Parameters:
strat- the custom strategy
-
-
Method Detail
-
setUseActualText
public LocationTextExtractionStrategy setUseActualText(boolean useActualText)
Changes the behavior of text extraction so that if the parameter is set totrue, /ActualText marked content property will be used instead of raw decoded bytes. Beware: the logic is not stable yet.- Parameters:
useActualText- true to use /ActualText, false otherwise- Returns:
- this object
-
setRightToLeftRunDirection
public LocationTextExtractionStrategy setRightToLeftRunDirection(boolean rightToLeftRunDirection)
Sets if text flows from left to right or from right to left. Call this method withtrueargument for extracting Arabic, Hebrew or other text with right-to-left writing direction.- Parameters:
rightToLeftRunDirection- value specifying whether the direction should be right to left- Returns:
- this object
-
isUseActualText
public boolean isUseActualText()
Gets the value of the property which determines if /ActualText will be used when extracting the text- Returns:
- true if /ActualText value is used, false otherwise
-
eventOccurred
public void eventOccurred(IEventData data, EventType type)
Description copied from interface:IEventListenerCalled when some event occurs during parsing a content stream.- Specified by:
eventOccurredin interfaceIEventListener- Parameters:
data- Combines the data required for processing corresponding event type.type- Event type.
-
getSupportedEvents
public java.util.Set<EventType> getSupportedEvents()
Description copied from interface:IEventListenerProvides the set of event types this listener supports. Returns null if all possible event types are supported.- Specified by:
getSupportedEventsin interfaceIEventListener- Returns:
- Set of event types supported by this listener or null if all possible event types are supported.
-
getResultantText
public java.lang.String getResultantText()
Description copied from interface:ITextExtractionStrategyReturns the text that has been processed so far.- Specified by:
getResultantTextin interfaceITextExtractionStrategy- Returns:
Stringinstance with the current resultant text
-
isChunkAtWordBoundary
protected boolean isChunkAtWordBoundary(TextChunk chunk, TextChunk previousChunk)
Determines if a space character should be inserted between a previous chunk and the current chunk. This method is exposed as a callback so subclasses can fine time the algorithm for determining whether a space should be inserted or not. By default, this method will insert a space if the there is a gap of more than half the font space character width between the end of the previous chunk and the beginning of the current chunk. It will also indicate that a space is needed if the starting point of the new chunk appears *before* the end of the previous chunk (i.e. overlapping text).- Parameters:
chunk- the new chunk being evaluatedpreviousChunk- the chunk that appeared immediately before the current chunk- Returns:
- true if the two chunks represent different words (i.e. should have a space between them). False otherwise.
-
startsWithSpace
private boolean startsWithSpace(java.lang.String str)
Checks if the string starts with a space character, false if the string is empty or starts with a non-space character.- Parameters:
str- the string to be checked- Returns:
- true if the string starts with a space character, false if the string is empty or starts with a non-space character
-
endsWithSpace
private boolean endsWithSpace(java.lang.String str)
Checks if the string ends with a space character, false if the string is empty or ends with a non-space character- Parameters:
str- the string to be checked- Returns:
- true if the string ends with a space character, false if the string is empty or ends with a non-space character
-
dumpState
private void dumpState()
Used for debugging only
-
findLastTagWithActualText
private CanvasTag findLastTagWithActualText(java.util.List<CanvasTag> canvasTagHierarchy)
-
sortWithMarks
private void sortWithMarks(java.util.List<TextChunk> textChunks)
-
-