Package org.apache.fontbox.ttf.gsub
Class CompoundCharacterTokenizer
java.lang.Object
org.apache.fontbox.ttf.gsub.CompoundCharacterTokenizer
Takes in the given text having compound-glyphs to substitute, and splits it into chunks consisting of parts that
should be substituted and the ones that can be processed normally.
-
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionCompoundCharacterTokenizer(Pattern pattern) Deprecated.CompoundCharacterTokenizer(Set<String> compoundWords) Constructor. -
Method Summary
-
Field Details
-
GLYPH_ID_SEPARATOR
- See Also:
-
regexExpression
-
-
Constructor Details
-
CompoundCharacterTokenizer
Constructor. Calls getRegexFromTokens which returns strings like (_79_99_)|(_80_99_)|(_92_99_) and creates a regexp assigned to regexExpression. See the code in GlyphArraySplitterRegexImpl on how these strings were created.It is assumed the compound words are sorted in descending order of length.
- Parameters:
compoundWords- A set of strings like _79_99_, _80_99_ or _92_99_ .
-
CompoundCharacterTokenizer
Deprecated.Constructor.- Parameters:
pattern-
-
-
Method Details
-
validateCompoundWords
Validate the compound words. They should not be null or empty and should start and end with the GLYPH_ID_SEPARATOR -
tokenize
Tokenize a string into tokens.- Parameters:
text- A string like "_66_71_71_74_79_70_"- Returns:
- A list of tokens like "_66_", "_71_71_", "74_79_70_". The "_" is sometimes missing at the beginning or end, this has to be cleaned by the caller.
-
getRegexFromTokens
-
CompoundCharacterTokenizer(java.util.Set)