$darkmode
A high-performance general-purpose compute library
/home/abuild/rpmbuild/BUILD/arrayfire-full-v3.9.0/docs/pages/timing.md
Go to the documentation of this file.
1 Timing ArrayFire Code {#timing}
2 ================
3 
4 In performance-sensitive applications, it is vital to profile and measure the
5 execution time of operations. ArrayFire provides mechanisms to achieve this.
6 
7 ArrayFire employs an asynchronous evaluation model for all of its
8 functions. This means that operations are queued to execute but do not
9 necessarily complete prior to function return. Hence, directly measuring the
10 time taken for an ArrayFire function could be misleading. To accurately
11 measure time, one must ensure the operations are evaluated and synchronize the
12 ArrayFire stream.
13 
14 ArrayFire also employs a lazy evaluation model for its elementwise arithmetic
15 operations. This means operations are not queued for execution until the
16 result is needed by downstream operations blocking until the operations are
17 complete.
18 
19 The following describes how to time ArrayFire code using the eval and sync
20 functions along with the timer and timeit functions. A final note on kernel
21 caching also provides helpful details about ArrayFire runtimes.
22 
23 ## Using ArrayFire eval and sync functions
24 
25 ArrayFire provides functions to force the evaluation of lazy functions and to
26 block until all asynchoronous operations complete.
27 
28 1. The [eval](\ref af::eval) function:
29 
30  Forces the evaluation of an ArrayFire array. It ensures the execution of
31  operations queued up for a specific array.
32 
33  It is only required for timing purposes if elementwise arithmetic functions
34  are called on the array, since these are handled by the ArrayFire JIT.
35 
36  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
37  af::array A = af::randu(1000, 1000);
38  af::array B = A + A; // Elementwise arithmetic operation.
39  B.eval(); // Forces evaluation of B.
40  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
41 
42  The function initializes the evaluation of the JIT-tree for that array and
43  may return prior to the completion of those operations. To ensure proper
44  timing, combine with a [sync](\ref af::sync) function.
45 
46 2. The [sync](\ref af::sync) function:
47 
48  Synchronizes the ArrayFire stream. It waits for all the previous operations
49  in the stream to finish. It is often used after [eval](\ref af::eval) to
50  ensure that operations have indeed been completed.
51 
52  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
53  af::sync(); // Waits for all previous operations to complete.
54  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 
56 ## Using ArrayFire timer and timeit functions
57 
58 ArrayFire provides a simple timer functions that returns the current time in
59 seconds.
60 
61 1. The [timer](\ref af::timer) function:
62 
63  timer() : A platform-independent timer with microsecond accuracy:
64  * [timer::start()](\ref af::timer::start) starts a timer
65 
66  * [timer::start()](\ref af::timer::stop) seconds since last \ref
67  af::timer::start "start"
68 
69  * \ref af::timer::stop(af::timer start) "timer::stop(timer start)" seconds
70  since 'start'
71 
72  Example: single timer
73 
74  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
75  // start timer
76  // - be sure to use the eval and sync functions so that previous code
77  // does not get timed as part of the execution segment being measured
78  timer::start();
79  // run a code segment
80  // - be sure to use the eval and sync functions to ensure the code
81  // segment operations have been completed
82  // stop timer
83  printf("elapsed seconds: %g\n", timer::stop());
84  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
85 
86  Example: multiple timers
87 
88  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
89  // start timers
90  // - be sure to use the eval and sync functions so that previous code
91  // does not get timed as part of the execution segment being measured
92  timer start1 = timer::start();
93  timer start2 = timer::start();
94  // run a code segment
95  // - be sure to use the eval and sync functions to ensure the code
96  // segment operations have been completed
97  // stop timer1
98  printf("elapsed seconds: %g\n", timer::stop(start1));
99  // run another code segment
100  // - be sure to use the eval and sync functions to ensure the code
101  // segment operations have been completed
102  // stop timer2
103  printf("elapsed seconds: %g\n", timer::stop(start2));
104  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
105 
106  Accurate and reliable measurement of performance involves several factors:
107  * Executing enough iterations to achieve peak performance.
108  * Executing enough repetitions to amortize any overhead from system timers.
109 
110 2. The [timeit](\ref af::timeit) function:
111 
112  To take care of much of this boilerplate, [timeit](\ref af::timeit) provides
113  accurate and reliable estimates of both CPU or GPU code.
114 
115  Here is a stripped down example of [Monte-Carlo estimation of PI](\ref
116  benchmarks/pi.cpp) making use of [timeit](\ref af::timeit). Notice how it
117  expects a `void` function pointer.
118 
119  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
120  #include <stdio.h>
121  #include <arrayfire.h>
122  using namespace af;
123 
124  void pi_function() {
125  int n = 20e6; // 20 million random samples
126  array x = randu(n, f32), y = randu(n, f32);
127  // how many fell inside unit circle?
128  float pi = 4.0 * sum<float>(sqrt(x*x + y*y)) < 1) / n;
129  }
130 
131  int main() {
132  printf("pi_function took %g seconds\n", timeit(pi_function));
133  return 0;
134  }
135  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
136 
137  This produces:
138 
139  pi_function took 0.007252 seconds
140  (test machine: Core i7 920 @ 2.67GHz with a Tesla C2070)
141 
142 
143 ## A note on kernel caching
144 
145 The first run of ArrayFire code exercises any JIT compilation in the
146 application, automatically saving a cache of the compilation to
147 disk. Subsequent runs load the cache from disk, executing without
148 compilation. Therefore, it is typically best to "warm up" the code with one
149 run to initiate the application's kernel cache. Afterwards, subsequent runs do
150 not include the compile time and are tend to be faster than the first run.
151 
152 Averaging the time taken is always the best approach and one reason why the
153 [timeit](\ref af::timeit) function is helpful.