Class Utf8Safe
There are several variants of UTF-8. The one implemented by this class is the restricted definition of UTF-8 introduced in Unicode 3.1, which mandates the rejection of "overlong" byte sequences as well as rejection of 3-byte surrogate codepoint byte sequences. Note that the UTF-8 decoder included in Oracle's JDK has been modified to also reject "overlong" byte sequences, but (as of 2011) still accepts 3-byte surrogate codepoint byte sequences.
The byte sequences considered valid by this class are exactly those that can be roundtrip converted to Strings and back to bytes using the UTF-8 charset, without loss:
Arrays.equals(bytes, new String(bytes, Internal.UTF_8).getBytes(Internal.UTF_8))
See the Unicode Standard, Table 3-6. UTF-8 Bit Distribution, Table 3-7. Well Formed UTF-8 Byte Sequences.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) static classNested classes/interfaces inherited from class io.objectbox.flatbuffers.Utf8
Utf8.DecodeUtil -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate static intcomputeEncodedLength(CharSequence sequence) Returns the number of bytes in the UTF-8-encoded form ofsequence.decodeUtf8(ByteBuffer buffer, int offset, int length) Decodes the given UTF-8 portion of theByteBufferinto aString.static StringdecodeUtf8Array(byte[] bytes, int index, int size) static StringdecodeUtf8Buffer(ByteBuffer buffer, int offset, int length) intReturns the number of bytes in the UTF-8-encoded form ofsequence.private static intencodedLengthGeneral(CharSequence sequence, int start) voidencodeUtf8(CharSequence in, ByteBuffer out) Encodes the given characters to the targetByteBufferusing UTF-8 encoding.private static intencodeUtf8Array(CharSequence in, byte[] out, int offset, int length) private static voidencodeUtf8Buffer(CharSequence in, ByteBuffer out) Methods inherited from class io.objectbox.flatbuffers.Utf8
encodeUtf8CodePoint, getDefault, setDefault
-
Constructor Details
-
Utf8Safe
public Utf8Safe()
-
-
Method Details
-
computeEncodedLength
Returns the number of bytes in the UTF-8-encoded form ofsequence. For a string, this method is equivalent tostring.getBytes(UTF_8).length, but is more efficient in both time and space.- Throws:
IllegalArgumentException- ifsequencecontains ill-formed UTF-16 (unpaired surrogates)
-
encodedLengthGeneral
-
decodeUtf8Array
-
decodeUtf8Buffer
-
encodedLength
Description copied from class:Utf8Returns the number of bytes in the UTF-8-encoded form ofsequence. For a string, this method is equivalent tostring.getBytes(UTF_8).length, but is more efficient in both time and space.- Specified by:
encodedLengthin classUtf8
-
decodeUtf8
Decodes the given UTF-8 portion of theByteBufferinto aString.- Specified by:
decodeUtf8in classUtf8- Throws:
IllegalArgumentException- if the input is not valid UTF-8.
-
encodeUtf8Buffer
-
encodeUtf8Array
-
encodeUtf8
Encodes the given characters to the targetByteBufferusing UTF-8 encoding.Selects an optimal algorithm based on the type of
ByteBuffer(i.e. heap or direct) and the capabilities of the platform.- Specified by:
encodeUtf8in classUtf8- Parameters:
in- the source string to be encodedout- the target buffer to receive the encoded string.
-