Class UnicodeReader

java.lang.Object
java.io.Reader
org.fife.io.UnicodeReader
All Implemented Interfaces:
Closeable, AutoCloseable, Readable

public class UnicodeReader extends Reader
A reader capable of identifying Unicode streams by their BOMs. This class will recognize the following encodings:
  • UTF-8
  • UTF-16LE
  • UTF-16BE
  • UTF-32LE
  • UTF-32BE
If the stream is not found to be any of the above, then a default encoding is used for reading. The user can specify this default encoding, or a system default will be used.

For optimum performance, it is recommended that you wrap all instances of UnicodeReader with a java.io.BufferedReader.

This class is mostly ripped off from the workaround in the description of Java Bug 4508058.

Version:
0.9
  • Field Details

    • internalIn

      private InputStreamReader internalIn
      The input stream from which we're really reading.
    • encoding

      private String encoding
      The encoding being used. We keep our own instead of using the string returned by java.io.InputStreamReader since that class does not return user-friendly names.
    • BOM_SIZE

      private static final int BOM_SIZE
      The size of a BOM.
      See Also:
  • Constructor Details

    • UnicodeReader

      public UnicodeReader(String file) throws IOException
      This utility constructor is here because you will usually use a UnicodeReader on files.

      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then a system default encoding is used.

      Parameters:
      file - The file from which you want to read.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
      SecurityException - If a security manager exists and its checkRead method denies read access to the file.
    • UnicodeReader

      public UnicodeReader(File file) throws IOException
      This utility constructor is here because you will usually use a UnicodeReader on files.

      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then a system default encoding is used.

      Parameters:
      file - The file from which you want to read.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
      SecurityException - If a security manager exists and its checkRead method denies read access to the file.
    • UnicodeReader

      public UnicodeReader(File file, String defaultEncoding) throws IOException
      This utility constructor is here because you will usually use a UnicodeReader on files.

      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then a specified default encoding is used.

      Parameters:
      file - The file from which you want to read.
      defaultEncoding - The encoding to use if no BOM is found. If this value is null, a system default is used.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
      SecurityException - If a security manager exists and its checkRead method denies read access to the file.
    • UnicodeReader

      public UnicodeReader(File file, Charset defaultCharset) throws IOException
      This utility constructor is here because you will usually use a UnicodeReader on files.

      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then a specified default encoding is used.

      Parameters:
      file - The file from which you want to read.
      defaultCharset - The encoding to use if no BOM is found. If this value is null, a system default is used.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
      SecurityException - If a security manager exists and its checkRead method denies read access to the file.
    • UnicodeReader

      public UnicodeReader(InputStream in) throws IOException
      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then a system default encoding is used.
      Parameters:
      in - The input stream from which to read.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
    • UnicodeReader

      public UnicodeReader(InputStream in, String defaultEncoding) throws IOException
      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then defaultEncoding is used.
      Parameters:
      in - The input stream from which to read.
      defaultEncoding - The encoding to use if no recognized BOM is found. If this value is null, a system default is used.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
    • UnicodeReader

      public UnicodeReader(InputStream in, Charset defaultCharset) throws IOException
      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then defaultEncoding is used.
      Parameters:
      in - The input stream from which to read.
      defaultCharset - The encoding to use if no recognized BOM is found. If this value is null, a system default is used.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
  • Method Details

    • close

      public void close() throws IOException
      Closes this reader.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in class Reader
      Throws:
      IOException
    • getEncoding

      public String getEncoding()
      Returns the encoding being used to read this input stream (i.e., the encoding of the file). If a BOM was recognized, then the specific Unicode type is returned; otherwise, either the default encoding passed into the constructor or the system default is returned.
      Returns:
      The encoding of the stream.
    • init

      protected void init(InputStream in, String defaultEncoding) throws IOException
      Read-ahead four bytes and check for BOM marks. Extra bytes are unread back to the stream, only BOM bytes are skipped.
      Parameters:
      defaultEncoding - The encoding to use if no BOM was recognized. If this value is null, then a system default is used.
      Throws:
      IOException - If an error occurs when trying to read a BOM.
    • read

      public int read(char[] cbuf, int off, int len) throws IOException
      Read characters into a portion of an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.
      Specified by:
      read in class Reader
      Parameters:
      cbuf - The buffer into which to read.
      off - The offset at which to start storing characters.
      len - The maximum number of characters to read.
      Returns:
      The number of characters read, or -1 if the end of the stream has been reached.
      Throws:
      IOException