- java.lang.Object
-
- kala.compress.harmony.pack200.Codec
-
- kala.compress.harmony.pack200.BHSDCodec
-
public final class BHSDCodec extends Codec
A BHSD codec is a means of encoding integer values as a sequence of bytes or vice versa using a specified "BHSD" encoding mechanism. It uses a variable-length encoding and a modified sign representation such that small numbers are represented as a single byte, whilst larger numbers take more bytes to encode. The number may be signed or unsigned; if it is unsigned, it can be weighted towards positive numbers or equally distributed using a one's complement. The Codec also supports delta coding, where a sequence of numbers is represented as a series of first-order differences. So a delta encoding of the integers [1..10] would be represented as a sequence of 10x1s. This allows the absolute value of a coded integer to fall outside of the 'small number' range, whilst still being encoded as a single byte.A BHSD codec is configured with four parameters:
- B
- The maximum number of bytes that each value is encoded as. B must be a value between [1..5]. For a pass-through coding (where each byte is encoded as
itself, aka
Codec.BYTE1, B is 1 (each byte takes a maximum of 1 byte). - H
- The radix of the integer. Values are defined as a sequence of values, where value
nis multiplied byH^<sup>n</sup>. So the number 1234 may be represented as the sequence 4 3 2 1 with a radix (H) of 10. Note that other permutations are also possible; 43 2 1 will also encode 1234. The co-parameter L is defined as 256-H. This is important because only the last value in a sequence may be < L; all prior values must be > L. - S
- Whether the codec represents signed values (or not). This may have 3 values; 0 (unsigned), 1 (signed, one's complement) or 2 (signed, two's complement)
- D
- Whether the codec represents a delta encoding. This may be 0 (no delta) or 1 (delta encoding). A delta encoding of 1 indicates that values are
cumulative; a sequence of
1 1 1 1 1will represent the sequence1 2 3 4 5. For this reason, the codec supports two variants of decode; onewithand onewithoutalastparameter. If the codec is a non-delta encoding, then the value is ignored if passed. If the codec is a delta encoding, it is a run-time error to call the value without the extra parameter, and the previous value should be returned. (It was designed this way to support multi-threaded access without requiring a new instance of the Codec to be cloned for each use.)
Codecs are notated as (B,H,S,D) and either D or S,D may be omitted if zero. Thus
Codec.BYTE1is denoted (1,256,0,0) or (1,256). ThetoString()method prints out the condensed form of the encoding. Often, the last character in the name (Codec.BYTE1,Codec.UNSIGNED5) gives a clue as to the B value. Those that start with U (Codec.UDELTA5,Codec.UNSIGNED5) are unsigned; otherwise, in most cases, they are signed. The presence of the word Delta (Codec.DELTA5,Codec.UDELTA5) indicates a delta encoding is used.
-
-
Field Summary
Fields Modifier and Type Field Description private intbThe maximum number of bytes in each coding word.private longcardinalityprivate intdWhether delta encoding is used (0=false,1=true).private inthThe radix of the encoding.private intlThe co-parameter of h; 256-h.private longlargestprivate long[]powersradix^i powersprivate intsRepresents signed numbers or not, 0 (unsigned), 1 (signed, one's complement) or 2 (signed, two's complement).private longsmallest
-
Constructor Summary
Constructors Constructor Description BHSDCodec(int b, int h)Constructs an unsigned, non-delta Codec with the given B and H values.BHSDCodec(int b, int h, int s)Constructs a non-delta Codec with the given B, H and S values.BHSDCodec(int b, int h, int s, int d)Constructs a Codec with the given B, H, S and D values.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private longcalculateLargest()private longcalculateSmallest()longcardinality()Returns the cardinality of this codec; that is, the number of distinct values that it can contain.intdecode(java.io.InputStream in)Decodes a sequence of bytes from the given input stream, returning the value as a long.intdecode(java.io.InputStream in, long last)Decodes a sequence of bytes from the given input stream, returning the value as a long.int[]decodeInts(int n, java.io.InputStream in)Decodes a sequence ofnvalues fromin.int[]decodeInts(int n, java.io.InputStream in, int firstValue)Decodes a sequence ofnvalues fromin.byte[]encode(int value)Encodes a single value into a sequence of bytes.byte[]encode(int value, int last)Encodes a single value into a sequence of bytes.booleanencodes(long value)True if this encoding can code the given valuebooleanequals(java.lang.Object o)intgetB()Gets the B.intgetH()Gets the H.intgetL()Gets the L.intgetS()Gets the S.inthashCode()booleanisDelta()Returns true if this codec is a delta codecbooleanisSigned()Returns true if this codec is a signed codeclonglargest()Returns the largest value that this codec can represent.longsmallest()Returns the smallest value that this codec can represent.java.lang.StringtoString()Returns the codec in the form (1,256) or (1,64,1,1).
-
-
-
Field Detail
-
b
private final int b
The maximum number of bytes in each coding word. B must be a value between [1..5]. For a pass-through coding (where each byte is encoded as itself, akaCodec.BYTE1, B is 1 (each byte takes a maximum of 1 byte).
-
d
private final int d
Whether delta encoding is used (0=false,1=true).
-
h
private final int h
The radix of the encoding.
-
l
private final int l
The co-parameter of h; 256-h.
-
s
private final int s
Represents signed numbers or not, 0 (unsigned), 1 (signed, one's complement) or 2 (signed, two's complement).
-
cardinality
private long cardinality
-
smallest
private final long smallest
-
largest
private final long largest
-
powers
private final long[] powers
radix^i powers
-
-
Constructor Detail
-
BHSDCodec
public BHSDCodec(int b, int h)Constructs an unsigned, non-delta Codec with the given B and H values.- Parameters:
b- the maximum number of bytes that a value can be encoded as [1..5]h- the radix of the encoding [1..256]
-
BHSDCodec
public BHSDCodec(int b, int h, int s)Constructs a non-delta Codec with the given B, H and S values.- Parameters:
b- the maximum number of bytes that a value can be encoded as [1..5]h- the radix of the encoding [1..256]s- whether the encoding represents signed numbers (s=0 is unsigned; s=1 is signed with 1s complement; s=2 is signed with ?)
-
BHSDCodec
public BHSDCodec(int b, int h, int s, int d)Constructs a Codec with the given B, H, S and D values.- Parameters:
b- the maximum number of bytes that a value can be encoded as [1..5]h- the radix of the encoding [1..256]s- whether the encoding represents signed numbers (s=0 is unsigned; s=1 is signed with 1s complement; s=2 is signed with ?)d- whether this is a delta encoding (d=0 is non-delta; d=1 is delta)
-
-
Method Detail
-
calculateLargest
private long calculateLargest()
-
calculateSmallest
private long calculateSmallest()
-
cardinality
public long cardinality()
Returns the cardinality of this codec; that is, the number of distinct values that it can contain.- Returns:
- the cardinality of this codec
-
decode
public int decode(java.io.InputStream in) throws java.io.IOException, Pack200ExceptionDescription copied from class:CodecDecodes a sequence of bytes from the given input stream, returning the value as a long. Note that this method can only be applied for non-delta encodings.- Specified by:
decodein classCodec- Parameters:
in- the input stream to read from- Returns:
- the value as a long
- Throws:
java.io.IOException- if there is a problem reading from the underlying input streamPack200Exception- if the encoding is a delta encoding
-
decode
public int decode(java.io.InputStream in, long last) throws java.io.IOException, Pack200ExceptionDescription copied from class:CodecDecodes a sequence of bytes from the given input stream, returning the value as a long. If this encoding is a delta encoding (d=1) then the previous value must be passed in as a parameter. If it is a non-delta encoding, then it does not matter what value is passed in, so it makes sense for the value to be passed in by default using code similar to:long last = 0; while (condition) { last = codec.decode(in, last); // do something with last }- Specified by:
decodein classCodec- Parameters:
in- the input stream to read fromlast- the previous value read, which must be supplied if the codec is a delta encoding- Returns:
- the value as a long
- Throws:
java.io.IOException- if there is a problem reading from the underlying input streamPack200Exception- if there is a problem decoding the value or that the value is invalid
-
decodeInts
public int[] decodeInts(int n, java.io.InputStream in) throws java.io.IOException, Pack200ExceptionDescription copied from class:CodecDecodes a sequence ofnvalues fromin. This should probably be used in most cases, since some codecs (such asPopulationCodec) only work when the number of values to be read is known.- Overrides:
decodeIntsin classCodec- Parameters:
n- the number of values to decodein- the input stream to read from- Returns:
- an array of
intvalues corresponding to values decoded - Throws:
java.io.IOException- if there is a problem reading from the underlying input streamPack200Exception- if there is a problem decoding the value or that the value is invalid
-
decodeInts
public int[] decodeInts(int n, java.io.InputStream in, int firstValue) throws java.io.IOException, Pack200ExceptionDescription copied from class:CodecDecodes a sequence ofnvalues fromin.- Overrides:
decodeIntsin classCodec- Parameters:
n- the number of values to decodein- the input stream to read fromfirstValue- the first value in the band if it has already been read- Returns:
- an array of
intvalues corresponding to values decoded, with firstValue as the first value in the array. - Throws:
java.io.IOException- if there is a problem reading from the underlying input streamPack200Exception- if there is a problem decoding the value or that the value is invalid
-
encode
public byte[] encode(int value) throws Pack200ExceptionDescription copied from class:CodecEncodes a single value into a sequence of bytes. Note that this method can only be used for non-delta encodings.- Specified by:
encodein classCodec- Parameters:
value- the value to encode- Returns:
- the encoded bytes
- Throws:
Pack200Exception- TODO
-
encode
public byte[] encode(int value, int last) throws Pack200ExceptionDescription copied from class:CodecEncodes a single value into a sequence of bytes.- Specified by:
encodein classCodec- Parameters:
value- the value to encodelast- the previous value encoded (for delta encodings)- Returns:
- the encoded bytes
- Throws:
Pack200Exception- TODO
-
encodes
public boolean encodes(long value)
True if this encoding can code the given value- Parameters:
value- the value to check- Returns:
trueif the encoding can encode this value
-
equals
public boolean equals(java.lang.Object o)
- Overrides:
equalsin classjava.lang.Object
-
getB
public int getB()
Gets the B.- Returns:
- the b
-
getH
public int getH()
Gets the H.- Returns:
- the h
-
getL
public int getL()
Gets the L.- Returns:
- the l
-
getS
public int getS()
Gets the S.- Returns:
- the s
-
hashCode
public int hashCode()
- Overrides:
hashCodein classjava.lang.Object
-
isDelta
public boolean isDelta()
Returns true if this codec is a delta codec- Returns:
- true if this codec is a delta codec
-
isSigned
public boolean isSigned()
Returns true if this codec is a signed codec- Returns:
- true if this codec is a signed codec
-
largest
public long largest()
Returns the largest value that this codec can represent.- Returns:
- the largest value that this codec can represent.
-
smallest
public long smallest()
Returns the smallest value that this codec can represent.- Returns:
- the smallest value that this codec can represent.
-
toString
public java.lang.String toString()
Returns the codec in the form (1,256) or (1,64,1,1). Note that trailing zero fields are not shown.- Overrides:
toStringin classjava.lang.Object
-
-