Class Charsets

java.lang.Object
kala.compress.utils.Charsets

public class Charsets extends Object
Utility methods for charsets. See kala.compress.archivers.zip.ZipEncoding
Since:
1.21.0.1
  • Field Details

    • HEX_CHARS

      private static final char[] HEX_CHARS
    • NATIVE_CHARSET

      private static final Charset NATIVE_CHARSET
  • Constructor Details

    • Charsets

      public Charsets()
  • Method Details

    • nativeCharset

      public static Charset nativeCharset()
      Returns the platform default charset. Users can override it by setting the system property 'kala.compress.native.charset'.
      Since:
      1.21.0.1
    • isUTF8

      public static boolean isUTF8(Charset charset)
      Tests whether a given encoding is UTF-8 or null.
      Since:
      1.27.1-0
    • toCharset

      public static Charset toCharset(String name)
      Returns a charset object for the named charset. Use this method instead of kala.compress.archivers.zip.ZipEncodingHelper#getZipEncoding(String)
      Parameters:
      name - The name of the encoding. Specify null for the UTF-8.
      Returns:
      A charset object for the named encoding
      Throws:
      IllegalCharsetNameException - If the given charset name is illegal
      UnsupportedCharsetException - If no support for the named charset is available in this instance of the Java virtual machine
    • toCharset

      public static Charset toCharset(String name, Charset defaultCharset)
      Returns a charset object for the named charset. If the requested character set cannot be found, defaultCharset will be used instead.
      Parameters:
      name - The name of the encoding. Specify null for the UTF-8.
      Returns:
      A charset object for the named encoding
    • toCharset

      public static Charset toCharset(Charset charset)
      Returns the given charset or the UTF-8 if the given charset is null.
      Parameters:
      charset - A charset or null.
      Returns:
      the given Charset or the UTF-8 if the given Charset is null
    • toCharset

      public static Charset toCharset(Charset charset, Charset defaultCharset)
      Returns the given charset or the UTF-8 if the given charset is null.
      Parameters:
      charset - A charset or null.
      Returns:
      the given Charset or the UTF-8 if the given Charset is null
    • canEncode

      public static boolean canEncode(Charset charset, String name)
      Check, whether the given string may be losslessly encoded using this encoding. Use this method instead of kala.compress.archivers.zip.ZipEncoding#canEncode(String)
      Parameters:
      name - A file name or ZIP comment.
      Returns:
      Whether the given name may be encoded with out any losses.
      See Also:
    • encode

      public static ByteBuffer encode(Charset charset, String name)
      Encode a file name or a comment to a byte array suitable for storing it to a serialized zip entry.

      Examples for CP 437 (in pseudo-notation, right hand side is C-style notation):

        encode("€_for_Dollar.txt") = "%U20AC_for_Dollar.txt"
        encode("Ölfässer.txt") = "\231lf\204sser.txt"
       
      Use this method instead of org.apache.commons.compress.archivers.zip.ZipEncoding#encode(String)
      Parameters:
      name - A file name or ZIP comment.
      Returns:
      A byte buffer with a backing array containing the encoded name. Unmappable characters or malformed character sequences are mapped to a sequence of utf-16 words encoded in the format %Uxxxx. It is assumed, that the byte buffer is positioned at the beginning of the encoded result, the byte buffer has a backing array and the limit of the byte buffer points to the end of the encoded result.
      Throws:
      IOException - on error
    • decode

      public static String decode(Charset charset, byte[] data) throws IOException
      Use this method instead of org.apache.commons.compress.archivers.zip.ZipEncoding#decode(byte[])
      Parameters:
      data - The byte values to decode.
      Returns:
      The decoded string.
      Throws:
      IOException - on error
    • growBufferBy

      private static ByteBuffer growBufferBy(ByteBuffer buffer, int increment)
    • encodeFully

      private static ByteBuffer encodeFully(CharsetEncoder enc, CharBuffer cb, ByteBuffer out)
    • encodeSurrogate

      private static CharBuffer encodeSurrogate(CharBuffer cb, char c)
    • encoderFor

      private static CharsetEncoder encoderFor(Charset charset)
    • decoderFor

      private static CharsetDecoder decoderFor(Charset charset)
    • estimateInitialBufferSize

      private static int estimateInitialBufferSize(CharsetEncoder enc, int charChount)
      Estimate the initial encoded size (in bytes) for a character buffer.

      The estimate assumes that one character consumes uses the maximum length encoding, whilst the rest use an average size encoding. This accounts for any BOM for UTF-16, at the expense of a couple of extra bytes for UTF-8 encoded ASCII.

      Parameters:
      enc - encoder to use for estimates
      charChount - number of characters in string
      Returns:
      estimated size in bytes.
    • estimateIncrementalEncodingSize

      private static int estimateIncrementalEncodingSize(CharsetEncoder enc, int charCount)
      Estimate the size needed for remaining characters
      Parameters:
      enc - encoder to use for estimates
      charCount - number of characters remaining
      Returns:
      estimated size in bytes.