This C++ API example demonstrates how to build an AlexNet neural network topology for forward-pass inference.
Example code: cnn_inference_f32.cpp
Some key take-aways include:
The example implements the AlexNet layers as numbered primitives (for example, conv1, pool1, conv2).
Initialize an engine and stream. The last parameter in the call represents the index of the engine.
Create a vector for the primitives and a vector to hold memory that will be used as arguments.
Allocate buffers for input and output data, weights, and bias.
Create memory that describes data layout in the buffers. This example uses tag::nchw (batch-channels-height-width) for input data and tag::oihw for weights.
Create memory descriptors with layout tag::any. The any format enables the convolution primitive to choose the data format that will result in best performance based on its input parameters (convolution kernel sizes, strides, padding, and so on). If the resulting format is different from nchw, the user data must be transformed to the format required for the convolution (as explained below).
Create a convolution descriptor by specifying propagation kind, convolution algorithm, shapes of input, weights, bias, output, convolution strides, padding, and kind of padding. Propagation kind is set to prop_kind::forward_inference to optimize for inference execution and omit computations that are necessary only for backward propagation.
Create a convolution primitive descriptor. Once created, this descriptor has specific formats instead of the any format specified in the convolution descriptor.
Check whether data and weights formats required by convolution is different from the user format. In case it is different change the layout using reorder primitive.
Create a memory primitive for output.
Create a convolution primitive and add it to the net.
Create the relu primitive. For better performance, keep the input data format for ReLU (as well as for other operation primitives until another convolution or inner product is encountered) the same as the one chosen for convolution. Also note that ReLU is done in-place by using conv1 memory.
For training execution, pooling requires a private workspace memory to perform the backward pass. However, pooling should not use 'workspace' for inference, because this is detrimental to performance.
The example continues to create more layers according to the AlexNet topology.
Finally, execute the primitives. For this example, the net is executed multiple times and each execution is timed individually.