Package org.apache.uima.cas.impl
Class BinaryCasSerDes4.Serializer
- java.lang.Object
-
- org.apache.uima.cas.impl.BinaryCasSerDes4.Serializer
-
- Enclosing class:
- BinaryCasSerDes4
private class BinaryCasSerDes4.Serializer extends java.lang.ObjectClass instantiated once per serialization Multiple serializations in parallel supported, with multiple instances of this
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description classBinaryCasSerDes4.Serializer.SerializeModifiedFSs
-
Field Summary
Fields Modifier and Type Field Description private java.io.ByteArrayOutputStream[]baosZipSourcesprivate CASImplbaseCasprivate BinaryCasSerDesbcsdprivate java.io.DataOutputStreambyte_dosprivate BinaryCasSerDes4.CompressLevelcompressLevelprivate BinaryCasSerDes4.CompressStratcompressStrategyprivate java.io.DataOutputStreamcontrol_dosprivate CommonSerDesSequentialcsdsprivate booleandoMeasurementprivate java.io.DataOutputStream[]dosZipSourcesprivate java.io.DataOutputStreamdouble_Exponent_dosprivate java.io.DataOutputStreamdouble_Mantissa_Sign_dosprivate java.io.DataOutputStreamfloat_Exponent_dosprivate java.io.DataOutputStreamfloat_Mantissa_Sign_dosprivate Obj2IntIdentityHashMap<TOP>fs2seqconvert between FSs and "sequential" numbers This is for compression efficiency and also is needed for backwards compatibility with v2 serialization forms, where index information was written using "sequential" numbers Note: This may be identity map, but may not in the case for V3 where some FSs are GC'd Contrast with fs2addr and addr2fs in csds - these use the pseudo v2 addresses as the intprivate java.io.DataOutputStreamfsIndexes_dosprivate intheapEndend of heap, in v2 pseudo-addr coordinates = addr of last + length of lastprivate intheapStartstart of heap, in v2 pseudo-addr coordinatesprivate booleanisDeltaprivate booleanisTsiprivate MarkerImplmarkprivate booleanonly1CommonStringprivate OptimizeStringsosprivate TOPprevFsprivate TOP[]prevFsByTypeFor differencing when reading and writing.private java.io.DataOutputStreamserializedOutprivate SerializationMeasuressmprivate java.io.DataOutputStreamstrLength_dosprivate java.io.DataOutputStreamstrOffset_dosprivate java.io.DataOutputStreamstrSeg_dosprivate java.io.DataOutputStreamtypeCode_dosprivate PositiveIntSetuimaSerializableSavedToCasSet of FSes on which UimaSerializable _save_to_cas_data has already been called.
-
Constructor Summary
Constructors Modifier Constructor Description privateSerializer(CASImpl cas, java.io.DataOutputStream serializedOut, MarkerImpl mark, SerializationMeasures sm, BinaryCasSerDes4.CompressLevel compressLevel, BinaryCasSerDes4.CompressStrat compressStrategy, boolean isTsi)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private voidcollectAndZip()Method: write with deflation into a single byte array stream skip if not worth deflating skip the Slot_Control stream record in the Slot_Control stream, for each deflated stream: the Slot index the number of compressed bytes the number of uncompressed bytes add to header: nbr of compressed entries the Slot_Control stream size the Slot_Control stream all the zipped streamsprivate intcompressFsxPart(int[] fsIndexes, int fsNdxStart, CommonSerDesSequential csds)private intencodeIntSign(int v)private voidextractStrings(TOP fs)add strings to the optimizestrings object If delta, only process for fs's that are new; modified string values picked up when scanning FsChange itemsprivate voidextractStringsFromModifications(CASImpl.FsChange fsChange)For delta, for each fsChange element, extract any stringsprivate intfs2seq(TOP fs)private intgetPrevArray0HeapRef()private intgetPrevArray0Int()private booleanisNoPrevArrayValue(CommonArrayFS prevCommonArray)private voidserialize()Form 4 serialization is tied to the layout of V2 Feature Structures in heaps.private voidserializeArray(TOP fs)private intserializeArrayLength(TOP fs)private voidserializeByKind(TOP fs, FeatureImpl feat)private voidserializeIndexedFeatureStructures(CommonSerDesSequential csds)private voidwriteDiff(int kind, int v, int prev)Encoding: bit 6 = sign: 1 = negative bit 7 = delta: 1 = deltaprivate voidwriteDouble(long raw)private voidwriteFloat(int raw)Need to support NAN sets, 0x7fc....private voidwriteFs(TOP fs)private voidwriteLong(long v, long prev)private voidwriteString(java.lang.String s)String encoding Length = 0 - used for null, no offset written Length = 1 - used for "", no offset written Length > 0 (subtract 1): used for actual string length Length < 0 - use (-length) as slot index (minimum is 1, slot 0 is NULL) For length > 0, write also the offset.private voidwriteStringInfo()Write the compressed string table(s)private voidwriteUnsignedByte(java.io.DataOutputStream s, int v)private voidwriteVnumber(int kind, int v)private voidwriteVnumber(int kind, long v)private voidwriteVnumber(java.io.DataOutputStream s, int v)private voidwriteVnumber(java.io.DataOutputStream s, long v)
-
-
-
Field Detail
-
serializedOut
private final java.io.DataOutputStream serializedOut
-
baseCas
private final CASImpl baseCas
-
bcsd
private final BinaryCasSerDes bcsd
-
mark
private final MarkerImpl mark
-
sm
private final SerializationMeasures sm
-
baosZipSources
private final java.io.ByteArrayOutputStream[] baosZipSources
-
dosZipSources
private final java.io.DataOutputStream[] dosZipSources
-
heapStart
private int heapStart
start of heap, in v2 pseudo-addr coordinates
-
heapEnd
private int heapEnd
end of heap, in v2 pseudo-addr coordinates = addr of last + length of last
-
isDelta
private final boolean isDelta
-
isTsi
private final boolean isTsi
-
doMeasurement
private final boolean doMeasurement
-
os
private final OptimizeStrings os
-
compressLevel
private final BinaryCasSerDes4.CompressLevel compressLevel
-
compressStrategy
private final BinaryCasSerDes4.CompressStrat compressStrategy
-
prevFsByType
private final TOP[] prevFsByType
For differencing when reading and writing. Also used for arrays to difference the 0th element.
-
prevFs
private TOP prevFs
-
only1CommonString
private boolean only1CommonString
-
byte_dos
private final java.io.DataOutputStream byte_dos
-
typeCode_dos
private final java.io.DataOutputStream typeCode_dos
-
strOffset_dos
private final java.io.DataOutputStream strOffset_dos
-
strLength_dos
private final java.io.DataOutputStream strLength_dos
-
float_Mantissa_Sign_dos
private final java.io.DataOutputStream float_Mantissa_Sign_dos
-
float_Exponent_dos
private final java.io.DataOutputStream float_Exponent_dos
-
double_Mantissa_Sign_dos
private final java.io.DataOutputStream double_Mantissa_Sign_dos
-
double_Exponent_dos
private final java.io.DataOutputStream double_Exponent_dos
-
fsIndexes_dos
private final java.io.DataOutputStream fsIndexes_dos
-
control_dos
private final java.io.DataOutputStream control_dos
-
strSeg_dos
private final java.io.DataOutputStream strSeg_dos
-
csds
private final CommonSerDesSequential csds
-
fs2seq
private final Obj2IntIdentityHashMap<TOP> fs2seq
convert between FSs and "sequential" numbers This is for compression efficiency and also is needed for backwards compatibility with v2 serialization forms, where index information was written using "sequential" numbers Note: This may be identity map, but may not in the case for V3 where some FSs are GC'd Contrast with fs2addr and addr2fs in csds - these use the pseudo v2 addresses as the int
-
uimaSerializableSavedToCas
private PositiveIntSet uimaSerializableSavedToCas
Set of FSes on which UimaSerializable _save_to_cas_data has already been called.
-
-
Constructor Detail
-
Serializer
private Serializer(CASImpl cas, java.io.DataOutputStream serializedOut, MarkerImpl mark, SerializationMeasures sm, BinaryCasSerDes4.CompressLevel compressLevel, BinaryCasSerDes4.CompressStrat compressStrategy, boolean isTsi)
- Parameters:
cas- -serializedOut- -mark- -sm- -compressLevel- -compressStrategy- -
-
-
Method Detail
-
serialize
private void serialize() throws java.io.IOExceptionForm 4 serialization is tied to the layout of V2 Feature Structures in heaps. It does not walk the indexes to serialize just those FSs that are reachable. For V3, it scans the CASImpl.id2fs information and serializes those (except those which have been GC'd). The seq numbers of the target incrementing sequentially will be different from the source id's if some FSs were GC'd. To determine for delta what new strings and new- Throws:
java.io.IOException
-
writeStringInfo
private void writeStringInfo() throws java.io.IOExceptionWrite the compressed string table(s)- Throws:
java.io.IOException
-
writeFs
private void writeFs(TOP fs) throws java.io.IOException
- Throws:
java.io.IOException
-
serializeIndexedFeatureStructures
private void serializeIndexedFeatureStructures(CommonSerDesSequential csds) throws java.io.IOException
- Throws:
java.io.IOException
-
compressFsxPart
private int compressFsxPart(int[] fsIndexes, int fsNdxStart, CommonSerDesSequential csds) throws java.io.IOException- Throws:
java.io.IOException
-
serializeArray
private void serializeArray(TOP fs) throws java.io.IOException
- Throws:
java.io.IOException
-
getPrevArray0HeapRef
private int getPrevArray0HeapRef()
-
getPrevArray0Int
private int getPrevArray0Int()
-
isNoPrevArrayValue
private boolean isNoPrevArrayValue(CommonArrayFS prevCommonArray)
-
serializeByKind
private void serializeByKind(TOP fs, FeatureImpl feat) throws java.io.IOException
- Throws:
java.io.IOException
-
serializeArrayLength
private int serializeArrayLength(TOP fs) throws java.io.IOException
- Throws:
java.io.IOException
-
collectAndZip
private void collectAndZip() throws java.io.IOExceptionMethod: write with deflation into a single byte array stream skip if not worth deflating skip the Slot_Control stream record in the Slot_Control stream, for each deflated stream: the Slot index the number of compressed bytes the number of uncompressed bytes add to header: nbr of compressed entries the Slot_Control stream size the Slot_Control stream all the zipped streams- Throws:
java.io.IOException- passthru
-
writeLong
private void writeLong(long v, long prev) throws java.io.IOException- Throws:
java.io.IOException
-
writeString
private void writeString(java.lang.String s) throws java.io.IOExceptionString encoding Length = 0 - used for null, no offset written Length = 1 - used for "", no offset written Length > 0 (subtract 1): used for actual string length Length < 0 - use (-length) as slot index (minimum is 1, slot 0 is NULL) For length > 0, write also the offset.- Throws:
java.io.IOException- passthru
-
writeFloat
private void writeFloat(int raw) throws java.io.IOExceptionNeed to support NAN sets, 0x7fc.... for NAN 0xff8.... for NAN, negative infinity 0x7f8 for NAN, positive infinity Because 0 occurs frequently, we reserve exp of 0 for the value 0- Parameters:
raw- the number to write- Throws:
java.io.IOException
-
writeVnumber
private void writeVnumber(int kind, int v) throws java.io.IOException- Throws:
java.io.IOException
-
writeVnumber
private void writeVnumber(int kind, long v) throws java.io.IOException- Throws:
java.io.IOException
-
writeVnumber
private void writeVnumber(java.io.DataOutputStream s, int v) throws java.io.IOException- Throws:
java.io.IOException
-
writeVnumber
private void writeVnumber(java.io.DataOutputStream s, long v) throws java.io.IOException- Throws:
java.io.IOException
-
writeUnsignedByte
private void writeUnsignedByte(java.io.DataOutputStream s, int v) throws java.io.IOException- Throws:
java.io.IOException
-
writeDouble
private void writeDouble(long raw) throws java.io.IOException- Throws:
java.io.IOException
-
encodeIntSign
private int encodeIntSign(int v)
-
writeDiff
private void writeDiff(int kind, int v, int prev) throws java.io.IOExceptionEncoding: bit 6 = sign: 1 = negative bit 7 = delta: 1 = delta- Parameters:
kind- the kind of sloti- runs from iHeap + 3 to end of array- Throws:
java.io.IOException- passthru
-
extractStrings
private void extractStrings(TOP fs)
add strings to the optimizestrings object If delta, only process for fs's that are new; modified string values picked up when scanning FsChange items- Parameters:
fs- feature structure
-
extractStringsFromModifications
private void extractStringsFromModifications(CASImpl.FsChange fsChange)
For delta, for each fsChange element, extract any strings- Parameters:
fsChange-
-
fs2seq
private int fs2seq(TOP fs)
-
-