Package org.w3c.tidy
Class EncodingUtils
- java.lang.Object
-
- org.w3c.tidy.EncodingUtils
-
public final class EncodingUtils extends java.lang.Object- Version:
- $Revision$ ($Author$)
- Author:
- Fabrizio Giustina
-
-
Field Summary
Fields Modifier and Type Field Description static intFSM_ASCIIstates for ISO 2022 A document in ISO-2022 based encoding uses some ESC sequences called "designator" to switch character sets.static intFSM_ESCstate ESC.static intFSM_ESCDstate ESCD.static intFSM_ESCDPstate ESCDP.static intFSM_ESCPstate ESCP.static intFSM_NONASCIIstate NONASCII.static intHIGH_UTF16_SURROGATEUTF-16 high surrogate.static intLOW_UTF16_SURROGATEutf16 low surrogate.static intMAX_UTF16_FROM_UCS4Max UTF-16 value.static intMAX_UTF8_FROM_UCS4Max UTF-88 valid char value.static intUNICODE_BOMthe default (big-endian) UNICODE BOM.static intUNICODE_BOM_BEthe big-endian (default) UNICODE BOM.static intUNICODE_BOM_LEthe little-endian UNICODE BOM.static intUNICODE_BOM_UTF8the UTF-8 UNICODE BOM.static intUTF16_HIGH_SURROGATE_BEGINUTF-16 surrogate pair areas: high surrogates begin.static intUTF16_HIGH_SURROGATE_ENDUTF-16 surrogate pair areas: high surrogates end.static intUTF16_LOW_SURROGATE_BEGINUTF-16 surrogate pair areas: low surrogates begin.static intUTF16_LOW_SURROGATE_ENDUTF-16 surrogate pair areas: low surrogates end.static intUTF16_SURROGATES_BEGINUTF-16 surrogates begin.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description protected static intdecodeMacRoman(int c)Function to convert from MacRoman to Unicode.protected static intdecodeWin1252(int c)Function for conversion from Windows-1252 to Unicode.
-
-
-
Field Detail
-
UNICODE_BOM_BE
public static final int UNICODE_BOM_BE
the big-endian (default) UNICODE BOM.- See Also:
- Constant Field Values
-
UNICODE_BOM
public static final int UNICODE_BOM
the default (big-endian) UNICODE BOM.- See Also:
- Constant Field Values
-
UNICODE_BOM_LE
public static final int UNICODE_BOM_LE
the little-endian UNICODE BOM.- See Also:
- Constant Field Values
-
UNICODE_BOM_UTF8
public static final int UNICODE_BOM_UTF8
the UTF-8 UNICODE BOM.- See Also:
- Constant Field Values
-
FSM_ASCII
public static final int FSM_ASCII
states for ISO 2022 A document in ISO-2022 based encoding uses some ESC sequences called "designator" to switch character sets. The designators defined and used in ISO-2022-JP are: "ESC" + "(" + ? for ISO646 variants "ESC" + "$" + ? and "ESC" + "$" + "(" + ? for multibyte character sets. State ASCII.- See Also:
- Constant Field Values
-
FSM_ESC
public static final int FSM_ESC
state ESC.- See Also:
- Constant Field Values
-
FSM_ESCD
public static final int FSM_ESCD
state ESCD.- See Also:
- Constant Field Values
-
FSM_ESCDP
public static final int FSM_ESCDP
state ESCDP.- See Also:
- Constant Field Values
-
FSM_ESCP
public static final int FSM_ESCP
state ESCP.- See Also:
- Constant Field Values
-
FSM_NONASCII
public static final int FSM_NONASCII
state NONASCII.- See Also:
- Constant Field Values
-
MAX_UTF8_FROM_UCS4
public static final int MAX_UTF8_FROM_UCS4
Max UTF-88 valid char value.- See Also:
- Constant Field Values
-
MAX_UTF16_FROM_UCS4
public static final int MAX_UTF16_FROM_UCS4
Max UTF-16 value.- See Also:
- Constant Field Values
-
LOW_UTF16_SURROGATE
public static final int LOW_UTF16_SURROGATE
utf16 low surrogate.- See Also:
- Constant Field Values
-
UTF16_SURROGATES_BEGIN
public static final int UTF16_SURROGATES_BEGIN
UTF-16 surrogates begin.- See Also:
- Constant Field Values
-
UTF16_LOW_SURROGATE_BEGIN
public static final int UTF16_LOW_SURROGATE_BEGIN
UTF-16 surrogate pair areas: low surrogates begin.- See Also:
- Constant Field Values
-
UTF16_LOW_SURROGATE_END
public static final int UTF16_LOW_SURROGATE_END
UTF-16 surrogate pair areas: low surrogates end.- See Also:
- Constant Field Values
-
UTF16_HIGH_SURROGATE_BEGIN
public static final int UTF16_HIGH_SURROGATE_BEGIN
UTF-16 surrogate pair areas: high surrogates begin.- See Also:
- Constant Field Values
-
UTF16_HIGH_SURROGATE_END
public static final int UTF16_HIGH_SURROGATE_END
UTF-16 surrogate pair areas: high surrogates end.- See Also:
- Constant Field Values
-
HIGH_UTF16_SURROGATE
public static final int HIGH_UTF16_SURROGATE
UTF-16 high surrogate.- See Also:
- Constant Field Values
-
-
Method Detail
-
decodeWin1252
protected static int decodeWin1252(int c)
Function for conversion from Windows-1252 to Unicode.- Parameters:
c- char to decode- Returns:
- decoded char
-
decodeMacRoman
protected static int decodeMacRoman(int c)
Function to convert from MacRoman to Unicode.- Parameters:
c- char to decode- Returns:
- decoded char
-
-