Package org.apache.uima.internal.util
Class CharacterUtils
- java.lang.Object
-
- org.apache.uima.internal.util.CharacterUtils
-
public class CharacterUtils extends java.lang.ObjectCollection of utilities for character handling. Contains utilities for semi-automatically creating lexer rules.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static classCharacterUtils.CharRangeRepresents character range.
-
Constructor Summary
Constructors Constructor Description CharacterUtils()Constructor for CharacterUtils.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description private static java.util.ArrayList<CharacterUtils.CharRange>getCharacterRanges(int[] charSpecs)static java.util.ArrayList<CharacterUtils.CharRange>getDigitRange()Generate an ArrayList of CharRanges for what Java considers to be a digit.static java.util.ArrayList<CharacterUtils.CharRange>getLetterRange()Generate an ArrayList of CharRanges for what Java considers to be a letter.private static booleanisType(char c, int[] types)static voidmain(java.lang.String[] args)static voidprintAntlrLexRule(java.lang.String name, java.util.ArrayList<CharacterUtils.CharRange> charRanges)static voidprintJavaCCLexRule(java.lang.String name, java.util.ArrayList<CharacterUtils.CharRange> charRanges)static java.lang.StringtoHexString(char c)Create a hex representation of the UTF-16 encoding of a Java char.static java.lang.StringtoUnicodeChar(char c)Create a hex representation of the UTF-16 encoding of a Java char.
-
-
-
Method Detail
-
isType
private static final boolean isType(char c, int[] types)
-
getCharacterRanges
private static java.util.ArrayList<CharacterUtils.CharRange> getCharacterRanges(int[] charSpecs)
-
toUnicodeChar
public static java.lang.String toUnicodeChar(char c)
Create a hex representation of the UTF-16 encoding of a Java char. This is the representation that's understood by Java when reading source code.- Parameters:
c- The char to be encoded.- Returns:
- String Hex representation of character. For example, the result of encoding
'A'would be"A".
-
toHexString
public static java.lang.String toHexString(char c)
Create a hex representation of the UTF-16 encoding of a Java char. This is the representation that's understood by the JavaCC lexer.- Parameters:
c- The char to be encoded.- Returns:
- String Hex representation of character. For example, the result of encoding
'A'would be"0x0041".
-
getLetterRange
public static java.util.ArrayList<CharacterUtils.CharRange> getLetterRange()
Generate an ArrayList of CharRanges for what Java considers to be a letter. I use this as input to Unicode agnostic lexers like ANTLR.- Returns:
- ArrayList A list of character ranges.
-
getDigitRange
public static java.util.ArrayList<CharacterUtils.CharRange> getDigitRange()
Generate an ArrayList of CharRanges for what Java considers to be a digit. I use this as input to Unicode agnostic lexers like ANTLR.- Returns:
- ArrayList A list of character ranges.
-
printAntlrLexRule
public static void printAntlrLexRule(java.lang.String name, java.util.ArrayList<CharacterUtils.CharRange> charRanges)
-
printJavaCCLexRule
public static void printJavaCCLexRule(java.lang.String name, java.util.ArrayList<CharacterUtils.CharRange> charRanges)
-
main
public static void main(java.lang.String[] args)
-
-