This C++ API example demonstrates how to run AlexNet's conv3 and relu3 with int8 data type.
Example code: cnn_inference_int8.cpp
Configure tensor shapes
Next, the example configures the scales used to quantize f32 data into int8. For this example, the scaling value is chosen as an arbitrary number, although in a realistic scenario, it should be calculated from a set of precomputed values as previously mentioned.
The source, weights, bias and destination datasets use the single-scale format with mask set to '0', while the output from the convolution (conv_scales) will use the array format where mask = 2 corresponding to the output dimension.
Create the memory primitives for user data (source, weights, and bias). The user data will be in its original 32-bit floating point format.
Create a memory descriptor for each convolution parameter. The convolution data uses 8-bit integer values, so the memory descriptors are configured as:
Note The destination type is chosen as unsigned because the convolution applies a ReLU operation where data results \(\geq 0\).
auto conv_src_md = memory::desc({conv_src_tz}, dt::u8, tag::any);auto conv_bias_md = memory::desc({conv_bias_tz}, dt::s8, tag::any);auto conv_weights_md = memory::desc({conv_weights_tz}, dt::s8, tag::any);auto conv_dst_md = memory::desc({conv_dst_tz}, dt::u8, tag::any);
Create a convolution descriptor passing the int8 memory descriptors as parameters.
Configuring int8-specific parameters in an int8 primitive is done via the Attributes Primitive. Create an attributes object for the convolution and configure it accordingly.
The ReLU layer from Alexnet is executed through the PostOps feature. Create a PostOps object and configure it to execute an eltwise relu operation.
Create a primitive descriptor using the convolution descriptor and passing along the int8 attributes in the constructor. The primitive descriptor for the convolution will contain the specific memory formats for the computation.
Create a memory for each of the convolution's data input parameters (source, bias, weights, and destination). Using the convolution primitive descriptor as the creation parameter enables oneDNN to configure the memory formats for the convolution.
Scaling parameters are passed to the reorder primitive via the attributes primitive.
User memory must be transformed into convolution-friendly memory (for int8 and memory format). A reorder layer performs the data transformation from f32 (the original user data) into int8 format (the data used for the convolution). In addition, the reorder transforms the user data into the required memory format (as explained in the simple_net example).
Create the convolution primitive and add it to the net. The int8 example computes the same Convolution +ReLU layers from AlexNet simple-net.cpp using the int8 and PostOps approach. Although performance is not measured here, in practice it would require less computation time to achieve similar results.
Finally, dst memory may be dequantized from int8 into the original f32 format. Create a memory primitive for the user data in the original 32-bit floating point format and then apply a reorder to transform the computation output data.
[Dequantize the result]