Class S2PointVectorCoder
- java.lang.Object
-
- com.google.common.geometry.S2PointVectorCoder
-
@GwtCompatible class S2PointVectorCoder extends java.lang.Object implements S2Coder<java.util.List<S2Point>>
An encoder/decoder ofLists.Values from the
Listreturned bydecode(Bytes, Cursor)are decoded only when they are accessed. This allows for very fast initialization and no additional memory use beyond the encoded data. The encoded data is not owned by theList; typically it points into a large contiguous buffer that contains other encoded data as well.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static classS2PointVectorCoder.Baseprivate static classS2PointVectorCoder.ByteArrayOutputA thin wrapper overByteArrayOutputStreamwhich allows the last written byte to be removed.private static classS2PointVectorCoder.CellPointRepresents a point that can be encoded as anS2CellIdcenter.private static classS2PointVectorCoder.FormatControls whether to optimize for speed or size when encoding points.private static classS2PointVectorCoder.MutableBlockCodeRepresents the encoding parameters to be used for a given block (consisting ofBLOCK_SIZEencodable 64-bit values).
-
Field Summary
Fields Modifier and Type Field Description private static intBLOCK_SHIFTThe left shift factor forBLOCK_SIZE.private static intBLOCK_SIZES2CellIds are represented in a special 64-bit format and are encoded in fixed-size blocks.(package private) static S2PointVectorCoderCOMPACTAn instance of aS2PointVectorCoderwhich encodes/decodesS2Points in the COMPACT encoding format.private static intENCODING_FORMAT_BITSprivate static byteENCODING_FORMAT_MASKprivate static longEXCEPTIONThe exception value in the COMPACT encoding format.(package private) static S2PointVectorCoderFASTAn instance of aS2PointVectorCoderwhich encodes/decodesS2Points in the FAST encoding format.private static intFORMAT_COMPACTThe value of the COMPACT encoding format.private static intFORMAT_FASTThe value of the FAST encoding format.private static intSIZEOF_S2POINTThe size of an encodedS2Pointin bytes (3 doubles * 8 bytes per double).private S2PointVectorCoder.Formattype
-
Constructor Summary
Constructors Modifier Constructor Description privateS2PointVectorCoder(S2PointVectorCoder.Format type)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static intbaseShift(int level, int baseBits)Returns the number of bits thatbaseshould be right-shifted in order to encode only its leadingbaseBitsbits, assuming that all points are encoded at the givenS2CellIdlevel.private static longbitMask(int n)Returns a bit mask withnlow-order 1 bits, for0 <= n <= 64.private static booleancanEncode(long dMin, long dMax, int deltaBits, int overlapBits, boolean haveExceptions)Returns true if the range of values[dMin, dMax]can be encoded using the specified parameters (deltaBits,overlapBits, andhaveExceptions).private static S2PointVectorCoder.BasechooseBase(com.google.common.primitives.ImmutableLongArray values, int level, boolean haveExceptions)Returns the global minimum valueS2PointVectorCoder.Base.baseand the number of bits that should be used to encode it (S2PointVectorCoder.Base.baseBits).private static intchooseBestLevel(java.util.List<S2Point> points, java.util.List<S2PointVectorCoder.CellPoint> cellPoints)Returns theS2CellIdlevel for which the greatest number of the given points can be represented as the center of anS2CellId, or -1 if there is no S2CellId that would result in significant space savings.private static com.google.common.primitives.ImmutableLongArrayconvertCellsToValues(java.util.List<S2PointVectorCoder.CellPoint> cellPoints, int level)Given a vector of points inS2PointVectorCoder.CellPointformat and anS2CellIdlevel that has been chosen for encoding, returns a vector of 64-bit values that should be encoded in order to represent these points.java.util.List<S2Point>decode(PrimitiveArrays.Bytes data, PrimitiveArrays.Cursor cursor)private static java.util.List<S2Point>decodeCompact(PrimitiveArrays.Bytes data, PrimitiveArrays.Cursor cursor)private static java.util.List<S2Point>decodeFast(PrimitiveArrays.Bytes data, PrimitiveArrays.Cursor cursor)voidencode(java.util.List<S2Point> values, java.io.OutputStream output)Encodesvaluetooutput.private static voidencodeCompact(java.util.List<S2Point> values, java.io.OutputStream output)Encodes a vector ofS2Points, optimizing for space.private static voidencodeFast(java.util.List<S2Point> values, java.io.OutputStream output)private static voidgetBlockCode(S2PointVectorCoder.MutableBlockCode code, com.google.common.primitives.ImmutableLongArray values, long base, boolean haveExceptions)Given a vector of 64-bit values to be encoded and anS2CellIdlevel, returns the optimal encoding parameters that should be used to encode each block.private static intmaxBitsForLevel(int level)Returns the maximum number of bits per value at the givenS2CellIdlevel.
-
-
-
Field Detail
-
FAST
static final S2PointVectorCoder FAST
An instance of aS2PointVectorCoderwhich encodes/decodesS2Points in the FAST encoding format. The FAST format is optimized for fast encoding/decoding.
-
COMPACT
static final S2PointVectorCoder COMPACT
An instance of aS2PointVectorCoderwhich encodes/decodesS2Points in the COMPACT encoding format. The COMPACT format is optimized for disk usage and memory footprint.
-
type
private final S2PointVectorCoder.Format type
-
FORMAT_FAST
private static final int FORMAT_FAST
The value of the FAST encoding format.- See Also:
- Constant Field Values
-
FORMAT_COMPACT
private static final int FORMAT_COMPACT
The value of the COMPACT encoding format.- See Also:
- Constant Field Values
-
ENCODING_FORMAT_BITS
private static final int ENCODING_FORMAT_BITS
- See Also:
- Constant Field Values
-
ENCODING_FORMAT_MASK
private static final byte ENCODING_FORMAT_MASK
- See Also:
- Constant Field Values
-
SIZEOF_S2POINT
private static final int SIZEOF_S2POINT
The size of an encodedS2Pointin bytes (3 doubles * 8 bytes per double).- See Also:
- Constant Field Values
-
BLOCK_SHIFT
private static final int BLOCK_SHIFT
The left shift factor forBLOCK_SIZE.- See Also:
- Constant Field Values
-
BLOCK_SIZE
private static final int BLOCK_SIZE
S2CellIds are represented in a special 64-bit format and are encoded in fixed-size blocks.BLOCK_SIZErepresents the number of values per block. Block sizes of 4, 8, 16, and 32 were tested, andBLOCK_SIZE== 16 seems to offer the best compression. (Note thatBLOCK_SIZE == 32requires some code modifications which have since been removed).- See Also:
- Constant Field Values
-
EXCEPTION
private static final long EXCEPTION
The exception value in the COMPACT encoding format.
-
-
Constructor Detail
-
S2PointVectorCoder
private S2PointVectorCoder(S2PointVectorCoder.Format type)
-
-
Method Detail
-
encode
public void encode(java.util.List<S2Point> values, java.io.OutputStream output) throws java.io.IOException
Description copied from interface:S2CoderEncodesvaluetooutput.
-
decode
public java.util.List<S2Point> decode(PrimitiveArrays.Bytes data, PrimitiveArrays.Cursor cursor)
Description copied from interface:S2CoderDecodes a value of typeS2Coderfromdatastarting atcursor.position.cursor.positionis updated to the position of the first byte indatafollowing the encoded value.
-
encodeFast
private static void encodeFast(java.util.List<S2Point> values, java.io.OutputStream output) throws java.io.IOException
- Throws:
java.io.IOException
-
decodeFast
private static java.util.List<S2Point> decodeFast(PrimitiveArrays.Bytes data, PrimitiveArrays.Cursor cursor)
-
encodeCompact
private static void encodeCompact(java.util.List<S2Point> values, java.io.OutputStream output) throws java.io.IOException
Encodes a vector ofS2Points, optimizing for space.OVERVIEW
We attempt to represent each S2Point as the center of an S2CellId. All S2CellIds must be at the same level. Any points that cannot be encoded exactly as S2CellId centers are stored as exceptions using 24 bytes each. If there are so many exceptions that the COMPACT encoding does not save significant space, we give up and use the uncompressed encoding.
The first step is to choose the best S2CellId level. This requires converting each point to (face, si, ti) coordinates and checking whether the point can be represented exactly as an S2CellId center at some level. We then build a histogram of S2CellId levels (just like the similar code in S2Polygon.encode) and choose the best level (or give up, if there are not enough S2CellId-encodable points).
The simplest approach would then be to take all the S2CellIds and right-shift them to remove all the constant bits at the chosen level. This would give the best spatial locality and hence the smallest deltas. However instead we give up some spatial locality and use the similar but faster transformation described below.
Each encodable point is first converted to the (sj, tj) representation defined below:
These two values encode the (face, si, ti) tuple using (2 * level + 3) bits. To see this, recall that "si" and "ti" are 31-bit values that all share a common suffix consisting of a "1" bit followed by (30 - level) "0" bits. The code above right-shifts these values to remove the constant bits and then prepends the bits for "face", yielding a total of (level + 2) bits for "sj" and (level + 1) bits for "tj".sj = (((face & 3) << 30) | (si >> 1)) >> (30 - level); tj = (((face & 4) << 29) | ti) >> (31 - level);We then combine (sj, tj) into one 64-bit value by interleaving bit pairs:
(We could also interleave individual bits, but it is faster this way). The result is similar to right-shifting an S2CellId by (61 - 2 * level), except that it is faster to decode and the spatial locality is not quite as good.v = interleaveBitPairs(sj, tj);The 64-bit values are divided into blocks of size 8, and then each value is encoded as the sum of a base value, a per-block offset, and a per-value delta within that block:
v[i,j] = base + offset[i] + delta[i, j]where "i" represents a block and "j" represents an entry in that block.
The deltas for each block are encoded using a fixed number of 4-bit nibbles (1-16 nibbles per delta). This allows any delta to be accessed in constant time.
The "offset" for each block is a 64-bit value encoded in 0-8 bytes. The offset is left-shifted such that it overlaps the deltas by a configurable number of bits (either 0 or 4), called the "overlap". The overlap and offset length (0-8 bytes) are specified per block. The reason for the overlap is that it allows fewer delta bits to be used in some cases. For example if base == 0 and the range within a block is 0xf0 to 0x110, then rather than using 12-bits deltas with an offset of 0, the overlap lets us use 8-bits deltas with an offset of 0xf0 (saving 7 bytes per block).
The global minimum value "base" is encoded using 0-7 bytes starting with the most-significant non-zero bit possible for the chosen level. For example, if (level == 7) then the encoded values have at most 17 bits, so if "base" is encoded in 1 byte then it is shifted to occupy bits 9-16.
Example: at level == 15, there are at most 33 non-zero value bits. The following shows the bit positions covered by "base", "offset", and "delta" assuming that "base" and "offset" are encoded in 2 bytes each, deltas are encoded in 2 nibbles (1 byte) each, and "overlap" is 4 bits:
Base: 1111111100000000----------------- Offset: -------------1111111100000000---- Delta: -------------------------00000000 Overlap: ^^^^The numbers (0 or 1) in this diagram denote the byte number of the encoded value. Notice that "base" is shifted so that it starts at the leftmost possible bit, "delta" always starts at the rightmost possible bit (bit 0), and "offset" is shifted so that it overlaps "delta" by the chosen "overlap" (either 0 or 4 bits). Also note that all of these values are summed, and therefore each value can affect higher-order bits due to carries.
NOTE(user): Encoding deltas in 4-bit rather than 8-bit length increments reduces encoded sizes by about 7%. Allowing a 4-bit overlap between the offset and deltas reduces encoded sizes by about 1%. Both optimizations make the code more complex but don't affect running times significantly.
ENCODING DETAILS
Now we can move on to the actual encodings. First, there is a 2 byte header encoded as follows:
Byte 0, bits 0-2: encodingFormat (COMPACT) Byte 0, bit 3: haveExceptions Byte 0, bits 4-7: (lastBlockSize - 1) Byte 1, bits 0-2: baseBytes Byte 1, bits 3-7: level (0-30)This is followed by an EncodedStringVector containing the encoded blocks. Each block contains BLOCK_SIZE (8) values. The total size of the EncodedS2PointVector is not stored explicitly, but instead is calculated as
num_values == BLOCK_SIZE * (numBlocks - 1) + lastBlockSize
(An empty vector has numBlocks == 0 and lastBlockSize == BLOCK_SIZE.)
Each block starts with a 1 byte header containing the following:
Byte 0, bits 0-2: (offsetBytes - overlapNibbles) Byte 0, bit 3: overlapNibbles Byte 0, bits 4-7: (deltaNibbles - 1)"overlapNibbles" is either 0 or 1 (indicating an overlap of 0 or 4 bits), while "offsetBytes" is in the range 0-8 (indicating the number of bytes used to encode the offset for this block). Note that some combinations cannot be encoded: in particular, offsetBytes == 0 can only be encoded with an overlap of 0 bits, and offsetBytes == 8 can only be encoded with an overlap of 4 bits. This allows us to encode offset lengths of 0-8 rather than just 0-7 without using an extra bit. (Note that the combinations that can't be encoded are not useful anyway).
The header is followed by "offsetBytes" bytes for the offset, and then (4 * deltaNibbles) bytes for the deltas.
If there are any points that could not be represented as S2CellIds, then "haveExceptions" in the header is true. In that case the delta values within each block are encoded as (delta + 8), and values 0-7 are used to represent exceptions. If a block has exceptions, they are encoded immediately following the array of deltas, and are referenced by encoding the corresponding exception index (0-7) as the delta.
TODO(user): A vector containing a single leaf cell is currently encoded as 13 bytes (2 byte header, 7 byte base, 1 byte block count, 1 byte block length, 1 byte block header, 1 byte delta). However if this case occurs often, a better solution would be implement a separate format that encodes the leading k bytes of an S2CellId. It would have a one-byte header consisting of the encoding format (3 bits) and the number of bytes encoded (3 bits), followed by the S2CellId bytes. The extra 2 header bits could be used to store single points using other encodings, e.g. E7.
If we wind up using 8-value blocks, we could also use the extra bit in the first byte of the header to indicate that there is only one value, and then skip the 2nd byte of header and the EncodedStringVector. But this would be messy because it also requires special cases while decoding. Essentially this would be a sub-format within the COMPACT format.
- Throws:
java.io.IOException
-
decodeCompact
private static java.util.List<S2Point> decodeCompact(PrimitiveArrays.Bytes data, PrimitiveArrays.Cursor cursor)
-
bitMask
private static long bitMask(int n)
Returns a bit mask withnlow-order 1 bits, for0 <= n <= 64.
-
maxBitsForLevel
private static int maxBitsForLevel(int level)
Returns the maximum number of bits per value at the givenS2CellIdlevel.
-
baseShift
private static int baseShift(int level, int baseBits)Returns the number of bits thatbaseshould be right-shifted in order to encode only its leadingbaseBitsbits, assuming that all points are encoded at the givenS2CellIdlevel.
-
chooseBestLevel
private static int chooseBestLevel(java.util.List<S2Point> points, java.util.List<S2PointVectorCoder.CellPoint> cellPoints)
Returns theS2CellIdlevel for which the greatest number of the given points can be represented as the center of anS2CellId, or -1 if there is no S2CellId that would result in significant space savings.Adds the
S2CellIdrepresentation of each point (if any) tocellPoints.
-
convertCellsToValues
private static com.google.common.primitives.ImmutableLongArray convertCellsToValues(java.util.List<S2PointVectorCoder.CellPoint> cellPoints, int level)
Given a vector of points inS2PointVectorCoder.CellPointformat and anS2CellIdlevel that has been chosen for encoding, returns a vector of 64-bit values that should be encoded in order to represent these points. Points that cannot be represented losslessly as the center of anS2CellIdat the chosen level are indicated by the valueEXCEPTION.
-
chooseBase
private static S2PointVectorCoder.Base chooseBase(com.google.common.primitives.ImmutableLongArray values, int level, boolean haveExceptions)
Returns the global minimum valueS2PointVectorCoder.Base.baseand the number of bits that should be used to encode it (S2PointVectorCoder.Base.baseBits).
-
canEncode
private static boolean canEncode(long dMin, long dMax, int deltaBits, int overlapBits, boolean haveExceptions)Returns true if the range of values[dMin, dMax]can be encoded using the specified parameters (deltaBits,overlapBits, andhaveExceptions).
-
getBlockCode
private static void getBlockCode(S2PointVectorCoder.MutableBlockCode code, com.google.common.primitives.ImmutableLongArray values, long base, boolean haveExceptions)
Given a vector of 64-bit values to be encoded and anS2CellIdlevel, returns the optimal encoding parameters that should be used to encode each block.
-
-