Class UnicodeInputStream

java.lang.Object
java.io.InputStream
jodd.io.UnicodeInputStream
All Implemented Interfaces:
Closeable, AutoCloseable

public class UnicodeInputStream extends InputStream
Unicode input stream for detecting UTF encodings and reading BOM characters. Detects following BOMs:
  • UTF-8
  • UTF-16BE
  • UTF-16LE
  • UTF-32BE
  • UTF-32LE
  • Field Details

    • MAX_BOM_SIZE

      public static final int MAX_BOM_SIZE
      See Also:
    • internalInputStream

      private final PushbackInputStream internalInputStream
    • initialized

      private boolean initialized
    • BOMSize

      private int BOMSize
    • encoding

      private Charset encoding
    • targetEncoding

      private final Charset targetEncoding
    • BOM_UTF32_BE

      public static final byte[] BOM_UTF32_BE
    • BOM_UTF32_LE

      public static final byte[] BOM_UTF32_LE
    • BOM_UTF8

      public static final byte[] BOM_UTF8
    • BOM_UTF16_BE

      public static final byte[] BOM_UTF16_BE
    • BOM_UTF16_LE

      public static final byte[] BOM_UTF16_LE
  • Constructor Details

    • UnicodeInputStream

      public UnicodeInputStream(InputStream in, Charset targetEncoding)
      Creates new unicode stream. It works in two modes: detect mode and read mode.

      Detect mode is active when target encoding is not specified. In detect mode, it tries to detect encoding from BOM if exist. If BOM doesn't exist, encoding is not detected.

      Read mode is active when target encoding is set. Then this stream reads optional BOM for given encoding. If BOM doesn't exist, nothing is skipped.

  • Method Details

    • getDetectedEncoding

      public Charset getDetectedEncoding()
      Returns detected UTF encoding or null if no UTF encoding has been detected (i.e. no BOM). If stream is not read yet, it will be initalized first.
    • init

      protected void init() throws IOException
      Detects and decodes encoding from BOM character. Reads ahead four bytes and check for BOM marks. Extra bytes are unread back to the stream, so only BOM bytes are skipped.
      Throws:
      IOException
    • close

      public void close() throws IOException
      Closes input stream. If stream was not used, encoding will be unavailable.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class InputStream
      Throws:
      IOException
    • read

      public int read() throws IOException
      Reads byte from the stream.
      Specified by:
      read in class InputStream
      Throws:
      IOException
    • getBOMSize

      public int getBOMSize()
      Returns BOM size in bytes. Returns -1 if BOM not found.