$darkmode
A high-performance general-purpose compute library
/home/abuild/rpmbuild/BUILD/arrayfire-full-v3.9.0/docs/pages/release_notes.md
Go to the documentation of this file.
1 Release Notes {#releasenotes}
2 ==============
3 
4 v3.9.0
5 ======
6 
7 ## Improvements
8 - Add oneAPI backend \PR{3296}
9 - Add support to directly access arrays on other devices \PR{3447}
10 - Add broadcast support \PR{2871}
11 - Improve OpenCL CPU JIT performance \PR{3257} \PR{3392}
12 - Optimize thread/block calculations of several kernels \PR{3144}
13 - Add support for fast math compiliation when building ArrayFire \PR{3334 \PR{3337}
14 - Optimize performance of fftconvolve when using floats \PR{3338}
15 - Add support for CUDA 12.1 and 12.2
16 - Better handling of empty arrays \PR{3398}
17 - Better handling of memory in linear algebra functions in OpenCL \PR{3423}
18 - Better logging with JIT kernels \PR{3468}
19 - Optimize memory manager/JIT interactions for small number of buffers \PR{3468}
20 - Documentation improvements \PR{3485}
21 - Optimize reorder function \PR{3488}
22 
23 ## Fixes
24 - Improve Errors when creating OpenCL contexts from devices \PR{3257}
25 - Improvements to vcpkg builds \PR{3376 \PR{3476}
26 - Fix reduce by key when nan's are present \PR{3261}
27 - Fix error in convolve where the ndims parameter was forced to be equal to 2 \PR{3277}
28 - Make constructors that accept dim_t to be explicit to avoid invalid conversions \PR{3259}
29 - Fix error in randu when compiling against clang 14 \PR{3333}
30 - Fix bug in OpenCL linear algebra functions \PR{3398}
31 - Fix bug with thread local variables when device was changed \PR{3420} \PR{3421}
32 - Fix bug in qr related to uninitialized memory \PR{3422}
33 - Fix bug in shift where the array had an empty middle dimension \PR{3488}
34 
35 
36 ## Contributions
37 
38 Special thanks to our contributors:
39 [Willy Born](https://github.com/willyborn)
40 [Mike Mullen](https://github.com/mfzmullen)
41 
42 v3.8.3
43 ======
44 
45 ## Improvements
46 
47 - Add support for CUDA 12 \PR{3352}
48 - Modernize documentation style and content \PR{3351}
49 - memcpy performance improvements \PR{3144}
50 - JIT performance improvements \PR{3144}
51 - join performance improvements \PR{3144}
52 - Improve support for Intel and newer Clang compilers \PR{3334}
53 - CCache support on Windows \PR{3257}
54 
55 ## Fixes
56 
57 - Fix issue with some locales with OpenCL kernel generation \PR{3294}
58 - Internal improvements
59 - Fix leak in clfft on exit.
60 - Fix some cases where ndims was incorrectly used ot calculate shape \PR{3277}
61 - Fix issue when setDevice was not called in new threads \PR{3269}
62 - Restrict initializer list to just fundamental types \PR{3264}
63 
64 ## Contributions
65 
66 Special thanks to our contributors:
67 [Carlo Cabrera](https://github.com/carlocab)
68 [Guillaume Schmid](https://github.com/GuillaumeSchmid)
69 [Willy Born](https://github.com/willyborn)
70 [ktdq](https://github.com/ktdq)
71 
72 
73 v3.8.2
74 ======
75 
76 ## Improvements
77 
78 - Optimize JIT by removing some consecutive cast operations \PR{3031}
79 - Add driver checks checks for CUDA 11.5 and 11.6 \PR{3203}
80 - Improve the timing algorithm used for timeit \PR{3185}
81 - Dynamically link against CUDA numeric libraries by default \PR{3205}
82 - Add support for pruning CUDA binaries to reduce static binary sizes \PR{3234} \PR{3237}
83 - Remove unused cuDNN libraries from installations \PR{3235}
84 - Add support to staticly link NVRTC libraries after CUDA 11.5 \PR{3236}
85 - Add support for compiling with ccache when building the CUDA backend \PR{3241}
86 - Make cuSparse an optional runtime dependency \PR{3240}
87 
88 ## Fixes
89 
90 - Fix issue with consecutive moddims operations in the CPU backend \PR{3232}
91 - Better floating point comparisons for tests \PR{3212}
92 - Fix several warnings and inconsistencies with doxygen and documentation \PR{3226}
93 - Fix issue when passing empty arrays into join \PR{3211}
94 - Fix default value for the `AF_COMPUTE_LIBRARY` when not set \PR{3228}
95 - Fix missing symbol issue when MKL is staticly linked \PR{3244}
96 - Remove linking of OpenCL's library to the unified backend \PR{3244}
97 
98 ## Contributions
99 
100 Special thanks to our contributors:
101 [Jacob Kahn](https://github.com/jacobkahn)
102 [Willy Born](https://github.com/willyborn)
103 
104 v3.8.1
105 ======
106 
107 ## Improvements
108 
109 - moddims now uses JIT approach for certain special cases - \PR{3177}
110 - Embed Version Info in Windows DLLs - \PR{3025}
111 - OpenCL device max parameter is now queries from device properties - \PR{3032}
112 - JIT Performance Optimization: Unique funcName generation sped up - \PR{3040}
113 - Improved readability of log traces - \PR{3050}
114 - Use short function name in non-debug build error messages - \PR{3060}
115 - SIFT/GLOH are now available as part of website binaries - \PR{3071}
116 - Short-circuit zero elements case in detail::copyArray backend function - \PR{3059}
117 - Speedup of kernel caching mechanism - \PR{3043}
118 - Add short-circuit check for empty Arrays in JIT evalNodes - \PR{3072}
119 - Performance optimization of indexing using dynamic thread block sizes - \PR{3111}
120 - ArrayFire starting with this release will use Intel MKL single dynamic library which resolves lot of linking issues unified library had when user applications used MKL themselves - \PR{3120}
121 - Add shortcut check for zero elements in af_write_array - \PR{3130}
122 - Speedup join by eliminating temp buffers for cascading joins - \PR{3145}
123 - Added batch support for solve - \PR{1705}
124 - Use pinned memory to copy device pointers in CUDA solve - \PR{1705}
125 - Added package manager instructions to docs - \PR{3076}
126 - CMake Build Improvements - \PR{3027} , \PR{3089} , \PR{3037} , \PR{3072} , \PR{3095} , \PR{3096} , \PR{3097} , \PR{3102} , \PR{3106} , \PR{3105} , \PR{3120} , \PR{3136} , \PR{3135} , \PR{3137} , \PR{3119} , \PR{3150} , \PR{3138} , \PR{3156} , \PR{3139} , \PR{1705} , \PR{3162}
127 - CPU backend improvements - \PR{3010} , \PR{3138} , \PR{3161}
128 - CUDA backend improvements - \PR{3066} , \PR{3091} , \PR{3093} , \PR{3125} , \PR{3143} , \PR{3161}
129 - OpenCL backend improvements - \PR{3091} , \PR{3068} , \PR{3127} , \PR{3010} , \PR{3039} , \PR{3138} , \PR{3161}
130 - General(including JIT) performance improvements across backends - \PR{3167}
131 - Testing improvements - \PR{3072} , \PR{3131} , \PR{3151} , \PR{3141} , \PR{3153} , \PR{3152} , \PR{3157} , \PR{1705} , \PR{3170} , \PR{3167}
132 - Update CLBlast to latest version - \PR{3135} , \PR{3179}
133 - Improved Otsu threshold computation helper in canny algorithm - \PR{3169}
134 - Modified default parameters for fftR2C and fftC2R C++ API from 0 to 1.0 - \PR{3178}
135 - Use appropriate MKL getrs_batch_strided API based on MKL Versions - \PR{3181}
136 
137 ## Fixes
138 
139 - Fixed a bug JIT kernel disk caching - \PR{3182}
140 - Fixed stream used by thrust(CUDA backend) functions - \PR{3029}
141 - Added workaround for new cuSparse API that was added by CUDA amid fix releases - \PR{3057}
142 - Fixed `const` array indexing inside `gfor` - \PR{3078}
143 - Handle zero elements in copyData to host - \PR{3059}
144 - Fixed double free regression in OpenCL backend - \PR{3091}
145 - Fixed an infinite recursion bug in NaryNode JIT Node - \PR{3072}
146 - Added missing input validation check in sparse-dense arithmetic operations - \PR{3129}
147 - Fixed bug in `getMappedPtr` in OpenCL due to invalid lambda capture - \PR{3163}
148 - Fixed bug in `getMappedPtr` on Arrays that are not ready - \PR{3163}
149 - Fixed edgeTraceKernel for CPU devices on OpenCL backend - \PR{3164}
150 - Fixed windows build issue(s) with VS2019 - \PR{3048}
151 - API documentation fixes - \PR{3075} , \PR{3076} , \PR{3143} , \PR{3161}
152 - CMake Build Fixes - \PR{3088}
153 - Fixed the tutorial link in README - \PR{3033}
154 - Fixed function name typo in timing tutorial - \PR{3028}
155 - Fixed couple of bugs in CPU backend canny implementation - \PR{3169}
156 - Fixed reference count of array(s) used in JIT operations. It is related to arrayfire's internal memory book keeping. The behavior/accuracy of arrayfire code wasn't broken earlier. It corrected the reference count to be of optimal value in the said scenarios. This may potentially reduce memory usage in some narrow cases - \PR{3167}
157 - Added assert that checks if topk is called with a negative value for k - \PR{3176}
158 - Fixed an Issue where countByKey would give incorrect results for any n > 128 - \PR{3175}
159 
160 ## Contributions
161 
162 Special thanks to our contributors:
163 [HO-COOH][https://github.com/HO-COOH]
164 [Willy Born][https://github.com/willyborn]
165 [Gilad Avidov][https://github.com/avidov]
166 [Pavan Yalamanchili][https://github.com/pavanky]
167 
168 v3.8.0
169 ======
170 
171 Major Updates
172 --------
173 - Non-uniform(ragged) reductions \PR{2786}
174 - Bit-wise not operator support for array and C API (af\_bitnot) \PR{2865}
175 - Initialization list constructor for array class \PR{2829} \PR{2987}
176 
177 Improvements
178 ------------
179 - New API for following statistics function: cov, var and stdev - \PR{2986}
180 - allocV2 and freeV2 which return cl\_mem on OpenCL backend \PR{2911}
181 - Move constructor and move assignment operator for Dim4 class \PR{2946}
182 - Support for CUDA 11.1 and Compute 8.6 \PR{3023}
183 - Fix af::feature copy constructor for multi-threaded sceanarios \PR{3022}
184 
185 v3.7.3
186 ======
187 
188 Improvements
189 ------------
190 - Add f16 support for histogram - \PR{2984}
191 - Update confidence connected components example with better illustration - \PR{2968}
192 - Enable disk caching of OpenCL kernel binaries - \PR{2970}
193 - Refactor extension of kernel binaries stored to disk `.bin` - \PR{2970}
194 - Add minimum driver versions for CUDA toolkit 11 in internal map - \PR{2982}
195 - Improve warnings messages from run-time kernel compilation functions - \PR{2996}
196 
197 Fixes
198 -----
199 - Fix bias factor of variance in var_all and cov functions - \PR{2986}
200 - Fix a race condition in confidence connected components function for OpenCL backend - \PR{2969}
201 - Safely ignore disk cache failures in CUDA backend for compiled kernel binaries - \PR{2970}
202 - Fix randn by passing in correct values to Box-Muller - \PR{2980}
203 - Fix rounding issues in Box-Muller function used for RNG - \PR{2980}
204 - Fix problems in RNG for older compute architectures with fp16 - \PR{2980} \PR{2996}
205 - Fix performance regression of approx functions - \PR{2977}
206 - Remove assert that check that signal/filter types have to be the same - \PR{2993}
207 - Fix `checkAndSetDevMaxCompute` when the device cc is greater than max - \PR{2996}
208 - Fix documentation errors and warnings - \PR{2973} , \PR{2987}
209 - Add missing opencl-arrayfire interoperability functions in unified backend - \PR{2981}
210 
211 Contributions
212 -------------
213 Special thanks to our contributors:
214 [P. J. Reed](https://github.com/pjreed)
215 
216 v3.7.2
217 ======
218 
219 Improvements
220 ------------
221 - Cache CUDA kernels to disk to improve load times(Thanks to \@cschreib-ibex) \PR{2848}
222 - Staticly link against cuda libraries \PR{2785}
223 - Make cuDNN an optional build dependency \PR{2836}
224 - Improve support for different compilers and OS \PR{2876} \PR{2945} \PR{2925} \PR{2942} \PR{2943} \PR{2945} \PR{2958}
225 - Improve performance of join and transpose on CPU \PR{2849}
226 - Improve documentation \PR{2816} \PR{2821} \PR{2846} \PR{2918} \PR{2928} \PR{2947}
227 - Reduce binary size using NVRTC and template reducing instantiations \PR{2849} \PR{2861} \PR{2890} \PR{2957}
228 - reduceByKey performance improvements \PR{2851} \PR{2957}
229 - Improve support for Intel OpenCL GPUs \PR{2855}
230 - Allow staticly linking against MKL \PR{2877} (Sponsered by SDL)
231 - Better support for older CUDA toolkits \PR{2923}
232 - Add support for CUDA 11 \PR{2939}
233 - Add support for ccache for faster builds \PR{2931}
234 - Add support for the conan package manager on linux \PR{2875}
235 - Propagate build errors up the stack in AFError exceptions \PR{2948} \PR{2957}
236 - Improve runtime dependency library loading \PR{2954}
237 - Improved cuDNN runtime checks and warnings \PR{2960}
238 - Document af\_memory\_manager\_* native memory return values \PR{2911}
239 
240 Fixes
241 -----
242 - Bug crash when allocating large arrays \PR{2827}
243 - Fix various compiler warnings \PR{2827} \PR{2849} \PR{2872} \PR{2876}
244 - Fix minor leaks in OpenCL functions \PR{2913}
245 - Various continuous integration related fixes \PR{2819}
246 - Fix zero padding with convolv2NN \PR{2820}
247 - Fix af_get_memory_pressure_threshold return value \PR{2831}
248 - Increased the max filter length for morph
249 - Handle empty array inputs for LU, QR, and Rank functions \PR{2838}
250 - Fix FindMKL.cmake script for sequential threading library \PR{2840} \PR{2952}
251 - Various internal refactoring \PR{2839} \PR{2861} \PR{2864} \PR{2873} \PR{2890} \PR{2891} \PR{2913} \PR{2959}
252 - Fix OpenCL 2.0 builtin function name conflict \PR{2851}
253 - Fix error caused when releasing memory with multiple devices \PR{2867}
254 - Fix missing set stacktrace symbol from unified API \PR{2915}
255 - Fix zero padding issue in convolve2NN \PR{2820}
256 - Fixed bugs in ReduceByKey \PR{2957}
257 
258 Contributions
259 -------------
260 Special thanks to our contributors:
261 [Corentin Schreiber](https://github.com/cschreib-ibex)
262 [Jacob Kahn](https://github.com/jacobkahn)
263 [Paul Jurczak](https://github.com/pauljurczak)
264 [Christoph Junghans](https://github.com/junghans)
265 
266 v3.7.1
267 ======
268 
269 Improvements
270 ------------
271 
272 - Improve mtx download for test data \PR{2742}
273 - Documentation improvements \PR{2754} \PR{2792} \PR{2797}
274 - Remove verbose messages in older CMake versions \PR{2773}
275 - Reduce binary size with the use of nvrtc \PR{2790}
276 - Use texture memory to load LUT in orb and fast \PR{2791}
277 - Add missing print function for f16 \PR{2784}
278 - Add checks for f16 support in the CUDA backend \PR{2784}
279 - Create a thrust policy to intercept tmp buffer allocations \PR{2806}
280 
281 Fixes
282 -----
283 
284 - Fix segfault on exit when ArrayFire is not initialized in the main thread
285 - Fix support for CMake 3.5.1 \PR{2771} \PR{2772} \PR{2760}
286 - Fix evalMultiple if the input array sizes aren't the same \PR{2766}
287 - Fix error when AF_BACKEND_DEFAULT is passed directly to backend \PR{2769}
288 - Workaround name collision with AMD OpenCL implementation \PR{2802}
289 - Fix on-exit errors with the unified backend \PR{2769}
290 - Fix check for f16 compatibility in OpenCL \PR{2773}
291 - Fix matmul on Intel OpenCL when passing same array as input \PR{2774}
292 - Fix CPU OpenCL blas batching \PR{2774}
293 - Fix memory pressure in the default memory manager \PR{2801}
294 
295 Contributions
296 -------------
297 Special thanks to our contributors:
298 [padentomasello](https://github.com/padentomasello)
299 [glavaux2](https://github.com/glavaux2)
300 
301 v3.7.0
302 ======
303 
304 Major Updates
305 -------------
306 
307 - Added the ability to customize the memory manager(Thanks jacobkahn and flashlight) \PR{2461}
308 - Added 16-bit floating point support for several functions \PR{2413} \PR{2587} \PR{2585} \PR{2587} \PR{2583}
309 - Added sumByKey, productByKey, minByKey, maxByKey, allTrueByKey, anyTrueByKey, countByKey \PR{2254}
310 - Added confidence connected components \PR{2748}
311 - Added neural network based convolution and gradient functions \PR{2359}
312 - Added a padding function \PR{2682}
313 - Added pinverse for pseudo inverse \PR{2279}
314 - Added support for uniform ranges in approx1 and approx2 functions. \PR{2297}
315 - Added support to write to preallocated arrays for some functions \PR{2599} \PR{2481} \PR{2328} \PR{2327}
316 - Added meanvar function \PR{2258}
317 - Add support for sparse-sparse arithmetic support
318 - Added rsqrt function for reciprocal square root
319 - Added a lower level af_gemm function for general matrix multiplication \PR{2481}
320 - Added a function to set the cuBLAS math mode for the CUDA backend \PR{2584}
321 - Separate debug symbols into separate files \PR{2535}
322 - Print stacktraces on errors \PR{2632}
323 - Support move constructor for af::array \PR{2595}
324 - Expose events in the public API \PR{2461}
325 - Add setAxesLabelFormat to format labels on graphs \PR{2495}
326 
327 Improvements
328 ------------
329 
330 - Better error messages for systems with driver or device incompatibilities \PR{2678} \PR{2448}
331 - Optimized unified backend function calls
332 - Optimized anisotropic smoothing \PR{2713}
333 - Optimized canny filter for CUDA and OpenCL
334 - Better MKL search script
335 - Better logging of different submodules in ArrayFire \PR{2670} \PR{2669}
336 - Improve documentation \PR{2665} \PR{2620} \PR{2615} \PR{2639} \PR{2628} \PR{2633} \PR{2622} \PR{2617} \PR{2558} \PR{2326} \PR{2515}
337 - Optimized af::array assignment \PR{2575}
338 - Update the k-means example to display the result \PR{2521}
339 
340 
341 Fixes
342 -----
343 
344 - Fix multi-config generators
345 - Fix access errors in canny
346 - Fix segfault in the unified backend if no backends are available
347 - Fix access errors in scan-by-key
348 - Fix sobel operator
349 - Fix an issue with the random number generator and s16
350 - Fix issue with boolean product reduction
351 - Fix array_proxy move constructor
352 - Fix convolve3 launch configuration
353 - Fix an issue where the fft function modified the input array \PR{2520}
354 
355 Contributions
356 -------------
357 Special thanks to our contributors:
358 [Jacob Khan](https://github.com/jacobkahn)
359 [William Tambellini](https://github.com/WilliamTambellini)
360 [Alexey Kuleshevich](https://github.com/lehins)
361 [Richard Barnes](https://github.com/r-barnes)
362 [Gaika](https://github.com/gaika)
363 [ShalokShalom](https://github.com/ShalokShalom)
364 
365 
366 v3.6.4
367 ======
368 
369 Bug Fixes
370 ---------
371 - Address a JIT performance regression due to moving kernel arguments to shared memory \PR{2501}
372 - Fix the default parameter for setAxisTitle \PR{2491}
373 
374 v3.6.3
375 ======
376 
377 Improvements
378 ------------
379 - Graphics are now a runtime dependency instead of a link time dependency \PR{2365}
380 - Reduce the CUDA backend binary size using runtime compilation of kernels \PR{2437}
381 - Improved batched matrix multiplication on the CPU backend by using Intel MKL's
382  `cblas_Xgemm_batched`\PR{2206}
383 - Print JIT kernels to disk or stream using the `AF_JIT_KERNEL_TRACE`
384  environment variable \PR{2404}
385 - `void*` pointers are now allowed as arguments to `af::array::write()` \PR{2367}
386 - Slightly improve the efficiency of JITed tile operations \PR{2472}
387 - Make the random number generation on the CPU backend to be consistent with
388  CUDA and OpenCL \PR{2435}
389 - Handled very large JIT tree generations \PR{2484} \PR{2487}
390 
391 Bug Fixes
392 ---------
393 - Fixed `af::array::array_proxy` move assignment operator \PR{2479}
394 - Fixed input array dimensions validation in svdInplace() \PR{2331}
395 - Fixed the typedef declaration for window resource handle \PR{2357}.
396 - Increase compatibility with GCC 8 \PR{2379}
397 - Fixed `af::write` tests \PR{2380}
398 - Fixed a bug in broadcast step of 1D exclusive scan \PR{2366}
399 - Fixed OpenGL related build errors on OSX \PR{2382}
400 - Fixed multiple array evaluation. Performance improvement. \PR{2384}
401 - Fixed buffer overflow and expected output of kNN SSD small test \PR{2445}
402 - Fixed MKL linking order to enable threaded BLAS \PR{2444}
403 - Added validations for forge module plugin availability before calling
404  resource cleanup \PR{2443}
405 - Improve compatibility on MSVC toolchain(_MSC_VER > 1914) with the CUDA
406  backend \PR{2443}
407 - Fixed BLAS gemm func generators for newest MSVC 19 on VS 2017 \PR{2464}
408 - Fix errors on exits when using the cuda backend with unified \PR{2470}
409 
410 Documentation
411 -------------
412 - Updated svdInplace() documentation following a bugfix \PR{2331}
413 - Fixed a typo in matrix multiplication documentation \PR{2358}
414 - Fixed a code snippet demostrating C-API use \PR{2406}
415 - Updated hamming matcher implementation limitation \PR{2434}
416 - Added illustration for the rotate function \PR{2453}
417 
418 Misc
419 ----
420 - Use cudaMemcpyAsync instead of cudaMemcpy throughout the codebase \PR{2362}
421 - Display a more informative error message if CUDA driver is incomptible
422  \PR{2421} \PR{2448}
423 - Changed forge resource managemenet to use smart pointers \PR{2452}
424 - Deprecated intl and uintl typedefs in API \PR{2360}
425 - Enabled graphics by default for all builds starting with v3.6.3 \PR{2365}
426 - Fixed several warnings \PR{2344} \PR{2356} \PR{2361}
427 - Refactored initArray() calls to use createEmptyArray(). initArray() is for
428  internal use only by Array class. \PR{2361}
429 - Refactored `void*` memory allocations to use unsigned char type \PR{2459}
430 - Replaced deprecated MKL API with in-house implementations for sparse
431  to sparse/dense conversions \PR{2312}
432 - Reorganized and fixed some internal backend API \PR{2356}
433 - Updated compilation order of cuda files to speed up compile time \PR{2368}
434 - Removed conditional graphics support builds after enabling runtime
435  loading of graphics dependencies \PR{2365}
436 - Marked graphics dependencies as optional in CPack RPM config \PR{2365}
437 - Refactored a sparse arithmetic backend API \PR{2379}
438 - Fixed const correctness of `af_device_array` API \PR{2396}
439 - Update Forge to v1.0.4 \PR{2466}
440 - Manage Forge resources from the DeviceManager class \PR{2381}
441 - Fixed non-mkl & non-batch blas upstream call arguments \PR{2401}
442 - Link MKL with OpenMP instead of TBB by default
443 - use clang-format to format source code
444 
445 Contributions
446 -------------
447 Special thanks to our contributors:
448 [Alessandro Bessi](https://github.com/alessandrobessi)
449 [zhihaoy](https://github.com/zhihaoy)
450 [Jacob Khan](https://github.com/jacobkahn)
451 [William Tambellini](https://github.com/WilliamTambellini)
452 
453 v3.6.2
454 ======
455 
456 Features
457 --------
458 - Added support for batching on the `cond` argument in select() \PR{2243}
459 - Added support for broadcasting batched matmul() \PR{2315}
460 - Added support for multiple nearest neighbors in nearestNeighbour() \PR{2280}
461 - Added support for clamp-to-edge padding as an `af_border_type` option \PR{2333}
462 
463 Improvements
464 ------------
465 - Improved performance of morphological operations \PR{2238}
466 - Fixed linking errors when compiling without Freeimage/Graphics \PR{2248}
467 - Improved the usage of ArrayFire as a CMake subproject \PR{2290}
468 - Enabled configuration of custom library path for loading dynamic backend
469  libraries \PR{2302}
470 
471 Bug Fixes
472 ---------
473 - Fixed LAPACK definitions and linking errors \PR{2239}
474 - Fixed overflow in dim4::ndims() \PR{2289}
475 - Fixed pow() precision for integral types \PR{2305}
476 - Fixed issues with tile() with a large repeat dimension \PR{2307}
477 - Fixed svd() sub-array output on OpenCL \PR{2279}
478 - Fixed grid-based indexing calculation in histogram() \PR{2230}
479 - Fixed bug in indexing when used after reorder \PR{2311}
480 - Fixed errors when exiting on Windows when using
481  [CLBlast](https://github.com/CNugteren/CLBlast) \PR{2222}
482 - Fixed fallthrough error in medfilt1 \PR{2349}
483 
484 Documentation
485 -------------
486 - Improved unwrap() documentation \PR{2301}
487 - Improved wrap() documentation \PR{2320}
488 - Improved accum() documentation \PR{2298}
489 - Improved tile() documentation \PR{2293}
490 - Clarified approx1() and approx2() indexing in documentation \PR{2287}
491 - Updated examples of [select()](@ref data_func_select) in detailed documentation
492  \PR{2277}
493 - Updated lookup() examples \PR{2288}
494 - Updated set operations' documentation \PR{2299}
495 
496 Misc
497 ----
498 - `af*` libraries and dependencies directory changed to `lib64` \PR{2186}
499 - Added new arrayfire ASSERT utility functions \PR{2249} \PR{2256} \PR{2257} \PR{2263}
500 - Improved error messages in JIT \PR{2309}
501 
502 Contributions
503 -------------
504 Special thanks to our contributors: [Jacob Kahn](https://github.com/jacobkahn),
505 [Vardan Akopian](https://github.com/vakopian)
506 
507 v3.6.1
508 ======
509 
510 Improvements
511 ------------
512 - FreeImage is now a run-time dependency [#2164]
513 - Reduced binary size by setting the symbol visibility to hidden [#2168]
514 - Add memory manager logging using the AF_TRACE=mem environment variable [#2169]
515 - Improved CPU Anisotropic Diffusion performance [#2174]
516 - Perform normalization after FFT for improved accuracy [#2185][#2192]
517 - Updated CLBlast to v1.4.0 [#2178]
518 - Added additional validation when using af::seq for indexing [#2153]
519 - Perform checks for unsupported cards by the CUDA implementation [#2182]
520 
521 Bug Fixes
522 ---------
523 - Fixed region when all pixels were the foreground or background [#2152]
524 - Fixed several memory leaks [#2202][#2201][#2180][#2179][#2177][#2175]
525 - Fixed bug in setDevice which didn't allow you to select the last device [#2189]
526 - Fixed bug in min/max where the first element of the array was a NaN value [#2155]
527 - Fixed window cell indexing for graphics [#2207]
528 
529 v3.6.0
530 ======
531 
532 The source code with submodules can be downloaded directly from the following link:
533 http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.0.tar.bz2
534 
535 Major Updates
536 -------------
537 
538 - Added the `topk()` function
539  [Documentation](http://arrayfire.org/docs/group__stat__func__topk.htm).
540  <sup>[1](https://github.com/arrayfire/arrayfire/pull/2061)</sup>
541 - Added batched matrix multiply support.
542  <sup>[2](https://github.com/arrayfire/arrayfire/pull/1898)</sup>
543  <sup>[3](https://github.com/arrayfire/arrayfire/pull/2059)</sup>
544 - Added anisotropic diffusion, `anisotropicDiffusion()`.
545  [Documentation](http://arrayfire.org/docs/group__image__func__anisotropic__diffusion.htm)
546  <sup>[4](https://github.com/arrayfire/arrayfire/pull/1850)</sup>.
547 
548 Features
549 --------
550 
551 - Added support for batched matrix multiply.
552  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1898)</sup>
553  <sup>[2](https://github.com/arrayfire/arrayfire/pull/2059)</sup>
554 - New anisotropic diffusion function, `anisotropicDiffusion()`.
555  [Documentation](http://arrayfire.org/docs/group__image__func__anisotropic__diffusion.htm)
556  <sup>[3](https://github.com/arrayfire/arrayfire/pull/1850)</sup>.
557 - New `topk()` function, which returns the top k elements along a given
558  dimension of the input.
559  [Documentation](http://arrayfire.org/docs/group__stat__func__topk.htm).
560  <sup>[4](https://github.com/arrayfire/arrayfire/pull/2061)</sup>
561 - New gradient diffusion
562  [example](https://github.com/arrayfire/arrayfire/blob/master/examples/image_processing/gradient_diffusion.cpp).
563 
564 Improvements
565 ------------
566 
567 - JITted `select()` and `shift()` functions for CUDA and OpenCL backends.
568  <sup>[1](https://github.com/arrayfire/arrayfire/pull/2047)</sup>
569 - Significant CMake improvements.
570  <sup>[2](https://github.com/arrayfire/arrayfire/pull/1861)</sup>
571  <sup>[3](https://github.com/arrayfire/arrayfire/pull/2070)</sup>
572  <sup>[4](https://github.com/arrayfire/arrayfire/pull/2018)</sup>
573 - Improved the quality of the random number generator, thanks to Ralf Stubner.
574  <sup>[5](https://github.com/arrayfire/arrayfire/pull/2122)</sup>
575 - Modified `af_colormap` struct to match forge's definition.
576  <sup>[6](https://github.com/arrayfire/arrayfire/pull/2082)</sup>
577 - Improved Black Scholes example.
578  <sup>[7](https://github.com/arrayfire/arrayfire/pull/2079)</sup>
579 - Using CPack to generate installers.
580  <sup>[8](https://github.com/arrayfire/arrayfire/pull/1861)</sup>
581 - Refactored
582  [black_scholes_options](https://github.com/arrayfire/arrayfire/blob/master/examples/financial/black_scholes_options.cpp)
583  example to use built-in `af::erfc` function for cumulative normal
584  distribution.<sup>[9](https://github.com/arrayfire/arrayfire/pull/2079)</sup>.
585 - Reduced the scope of mutexes in memory manager
586  <sup>[10](https://github.com/arrayfire/arrayfire/pull/2125)</sup>
587 - Official installers do not require the CUDA toolkit to be installed
588 - Significant CMake improvements have been made. Using CPack to generate
589  installers. <sup>[11](https://github.com/arrayfire/arrayfire/pull/1861)</sup>
590  <sup>[12](https://github.com/arrayfire/arrayfire/pull/2070)</sup>
591  <sup>[13](https://github.com/arrayfire/arrayfire/pull/2018)</sup>
592 - Corrected assert function calls in select() tests.
593  <sup>[14](https://github.com/arrayfire/arrayfire/pull/2058)</sup>
594 
595 Bug fixes
596 -----------
597 
598 - Fixed `shfl_down()` warnings with CUDA 9.
599  <sup>[1](https://github.com/arrayfire/arrayfire/pull/2040)</sup>
600 - Disabled CUDA JIT debug flags on ARM
601  architecture.<sup>[2](https://github.com/arrayfire/arrayfire/pull/2037)</sup>
602 - Fixed CLBLast install lib dir for linux platform where `lib` directory has
603  arch(64) suffix.<sup>[3](https://github.com/arrayfire/arrayfire/pull/2094)</sup>
604 - Fixed assert condition in 3d morph opencl
605  kernel.<sup>[4](https://github.com/arrayfire/arrayfire/pull/2033)</sup>
606 - Fix JIT errors with large non-linear
607  kernels<sup>[5](https://github.com/arrayfire/arrayfire/pull/2127)</sup>
608 - Fix bug in CPU jit after moddims was called
609  <sup>[5](https://github.com/arrayfire/arrayfire/pull/2127)</sup>
610 - Fixed deadlock caused by calls to from the worker thread
611  <sup>[6](https://github.com/arrayfire/arrayfire/pull/2124)</sup>
612 
613 Documentation
614 -------------
615 
616 - Fixed variable name typo in `vectorization.md`.
617  <sup>[1](https://github.com/arrayfire/arrayfire/pull/2032)</sup>
618 - Fixed `AF_API_VERSION` value in Doxygen config file.
619  <sup>[2](https://github.com/arrayfire/arrayfire/pull/2053)</sup>
620 
621 Known issues
622 ------------
623 
624 - Several OpenCL tests failing on OSX:
625  - `canny_opencl, fft_opencl, gen_assign_opencl, homography_opencl,
626  reduce_opencl, scan_by_key_opencl, solve_dense_opencl,
627  sparse_arith_opencl, sparse_convert_opencl, where_opencl`
628 
629 Community contributions
630 -----------------------
631 
632 Special thanks to our contributors:
633 [Adrien F. Vincent](https://github.com/afvincent), [Cedric
634 Nugteren](https://github.com/CNugteren),
635 [Felix](https://github.com/fzimmermann89), [Filip
636 Matzner](https://github.com/FloopCZ),
637 [HoneyPatouceul](https://github.com/HoneyPatouceul), [Patrick
638 Lavin](https://github.com/plavin), [Ralf Stubner](https://github.com/rstub),
639 [William Tambellini](https://github.com/WilliamTambellini)
640 
641 
642 v3.5.1
643 ======
644 
645 The source code with submodules can be downloaded directly from the following
646 link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.5.1.tar.bz2
647 
648 Installer CUDA Version: 8.0 (Required) Installer OpenCL Version: 1.2 (Minimum)
649 
650 Improvements
651 ------------
652 - Relaxed `af::unwrap()` function's arguments.
653  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1853)</sup>
654 - Changed behavior of af::array::allocated() to specify memory allocated.
655  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1877)</sup>
656 - Removed restriction on the number of bins for `af::histogram()` on CUDA and
657  OpenCL kernels. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1895)</sup>
658 
659 
660 Performance
661 -----------
662 
663 - Improved JIT performance.
664  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1864)</sup>
665 - Improved CPU element-wise operation performance.
666  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1890)</sup>
667 - Improved regions performance using texture objects. <sup>
668  [1](https://github.com/arrayfire/arrayfire/pull/1903)</sup>
669 
670 
671 Bug fixes
672 ---------
673 - Fixed overflow issues in mean.
674  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1849)</sup>
675 - Fixed memory leak when chaining indexing operations.
676  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1879)</sup>
677 - Fixed bug in array assignment when using an empty array to index.
678  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1897)</sup>
679 - Fixed bug with `af::matmul()` which occured when its RHS argument was an
680  indexed vector.
681  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1883)</sup>
682 - Fixed bug deadlock bug when sparse array was used with a JIT Array.
683  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1889)</sup>
684 - Fixed pixel tests for FAST kernels.
685  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1891)</sup>
686 - Fixed `af::replace` so that it is now copy-on-write.
687  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1892)</sup>
688 - Fixed launch configuration issues in CUDA JIT.
689  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1893)</sup>
690 - Fixed segfaults and "Pure Virtual Call" error warnings when exiting on
691  Windows. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1899)
692  [2](https://github.com/arrayfire/arrayfire/pull/1924)</sup>
693 - Workaround for `clEnqueueReadBuffer` bug on OSX.
694  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1888)</sup>
695 
696 Build
697 -----
698 
699 - Fixed issues when compiling with GCC 7.1.
700  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1872)</sup>
701  <sup>[2](https://github.com/arrayfire/arrayfire/pull/1876)</sup>
702 - Eliminated unnecessary Boost dependency from CPU and CUDA backends.
703  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1857)</sup>
704 
705 Misc
706 ----
707 
708 - Updated support links to point to Slack instead of Gitter.
709  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1905)</sup>
710 
711 
712 
713 v3.5.0
714 ==============
715 
716 Major Updates
717 -------------
718 
719 * ArrayFire now supports threaded applications.
720  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1706)</sup>
721 * Added Canny edge detector.
722  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1743)</sup>
723 * Added Sparse-Dense arithmetic operations.
724  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1696)</sup>
725 
726 Features
727 --------
728 
729 * ArrayFire Threading
730  * \ref af::array can be read by multiple threads
731  * All ArrayFire functions can be executed concurrently by multiple threads
732  * Threads can operate on different devices to simplify Muli-device workloads
733 * New Canny edge detector function, \ref af::canny().
734  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1743)</sup>
735  * Can automatically calculate high threshold with `AF_CANNY_THRESHOLD_AUTO_OTSU`
736  * Supports both L1 and L2 Norms to calculate gradients
737 * New tuned OpenCL BLAS backend,
738  [CLBlast](https://github.com/arrayfire/arrayfire/pull/1727).
739 
740 Improvements
741 ------------
742 
743 * Converted CUDA JIT to use
744  [NVRTC](http://docs.nvidia.com/cuda/nvrtc/index.html) instead of
745  [NVVM](http://docs.nvidia.com/cuda/nvvm-ir-spec/index.html).
746 * Performance improvements in \ref af::reorder().
747  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1766)</sup>
748 * Performance improvements in \ref af::array::scalar<T>().
749  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1809)</sup>
750 * Improved unified backend performance.
751  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1770)</sup>
752 * ArrayFire now depends on Forge
753  v1.0. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1800)</sup>
754 * Can now specify the FFT plan cache size using the
755  \ref af::setFFTPlanCacheSize() function.
756 * Get the number of physical bytes allocated by the memory manager
757  \ref af_get_allocated_bytes(). <sup>[1](https://github.com/arrayfire/arrayfire/pull/1630)</sup>
758 * \ref af::dot() can now return a scalar value to the
759  host. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1628)</sup>
760 
761 Bug Fixes
762 ---------
763 
764 * Fixed improper release of default Mersenne random
765  engine. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1716)</sup>
766 * Fixed \ref af::randu() and \ref af::randn() ranges for floating point
767  types. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1784)</sup>
768 * Fixed assignment bug in CPU
769  backend. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1765)</sup>
770 * Fixed complex (`c32`,`c64`) multiplication in OpenCL convolution
771  kernels. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1816)</sup>
772 * Fixed inconsistent behavior with \ref af::replace() and \ref
773  af_replace_scalar(). <sup>[1](https://github.com/arrayfire/arrayfire/pull/1773)</sup>
774 * Fixed memory leak in \ref
775  af_fir(). <sup>[1](https://github.com/arrayfire/arrayfire/pull/1765)</sup>
776 * Fixed memory leaks in \ref af_cast for sparse arrays.
777  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1826)</sup>
778 * Fixing correctness of \ref af_pow for complex numbers by using Cartesian
779  form. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1765)</sup>
780 * Corrected \ref af::select() with indexing in CUDA and OpenCL
781  backends. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1731)</sup>
782 * Workaround for VS2015 compiler ternary
783  bug. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1771)</sup>
784 * Fixed memory corruption in
785  `cuda::findPlan()`. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1793)</sup>
786 * Argument checks in \ref af_create_sparse_array avoids inputs of type
787  int64. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1747)</sup>
788 * Fixed issue with indexing an array with a step size != 1. <sup>[1](https://github.com/arrayfire/arrayfire/issues/1846)</sup>
789 
790 Build fixes
791 -----------
792 
793 * On OSX, utilize new GLFW package from the brew package
794  manager. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1720)</sup>
795  <sup>[2](https://github.com/arrayfire/arrayfire/pull/1775)</sup>
796 * Fixed CUDA PTX names generated by CMake
797  v3.7. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1689)</sup>
798 * Support `gcc` > 5.x for
799  CUDA. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1708)</sup>
800 
801 Examples
802 --------
803 
804 * New genetic algorithm example.
805  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1695)</sup>
806 
807 Documentation
808 -------------
809 
810 * Updated `README.md` to improve readability and
811  formatting. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1726)</sup>
812 * Updated `README.md` to mention Julia and Nim
813  wrappers. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1714)</sup>
814 * Improved installation instructions -
815  `docs/pages/install.md`. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1740)</sup>
816 
817 Miscellaneous
818 -------------
819 
820 * A few improvements for ROCm
821  support. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1710)</sup>
822 * Removed CUDA 6.5 support.
823  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1687)</sup>
824 
825 Known issues
826 ------------
827 
828 * Windows
829  * The Windows NVIDIA driver version `37x.xx` contains a bug which causes
830  `fftconvolve_opencl` to fail. Upgrade or downgrade to a different version of
831  the driver to avoid this failure.
832  * The following tests fail on Windows with NVIDIA hardware:
833  `threading_cuda`,`qr_dense_opencl`, `solve_dense_opencl`.
834 * macOS
835  * The Accelerate framework, used by the CPU backend on macOS, leverages Intel
836  graphics cards (Iris) when there are no discrete GPUs available. This OpenCL
837  implementation is known to give incorrect results on the following tests:
838  `lu_dense_{cpu,opencl}`, `solve_dense_{cpu,opencl}`,
839  `inverse_dense_{cpu,opencl}`.
840  * Certain tests intermittently fail on macOS with NVIDIA GPUs apparently due
841  to inconsistent driver behavior: `fft_large_cuda` and `svd_dense_cuda`.
842  * The following tests are currently failing on macOS with AMD GPUs:
843  `cholesky_dense_opencl` and `scan_by_key_opencl`.
844 
845 
846 v3.4.2
847 ==============
848 
849 Deprecation Announcement
850 ------------------------
851 
852 This release supports CUDA 6.5 and higher. The next ArrayFire relase will
853 support CUDA 7.0 and higher, dropping support for CUDA 6.5. Reasons for no
854 longer supporting CUDA 6.5 include:
855 
856 * CUDA 7.0 NVCC supports the C++11 standard (whereas CUDA 6.5 does not), which
857  is used by ArrayFire's CPU and OpenCL backends.
858 * Very few ArrayFire users still use CUDA 6.5.
859 
860 As a result, the older Jetson TK1 / Tegra K1 will no longer be supported in
861 the next ArrayFire release. The newer Jetson TX1 / Tegra X1 will continue to
862 have full capability with ArrayFire.
863 
864 Docker
865 ------
866 * [ArrayFire has been Dockerized](https://github.com/arrayfire/arrayfire-docker).
867 
868 Improvements
869 ------------
870 * Implemented sparse storage format conversions between \ref AF_STORAGE_CSR
871  and \ref AF_STORAGE_COO.
872  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1642)</sup>
873  * Directly convert between \ref AF_STORAGE_COO <--> \ref AF_STORAGE_CSR
874  using the af::sparseConvertTo() function.
875  * af::sparseConvertTo() now also supports converting to dense.
876 * Added cast support for [sparse arrays](\ref sparse_func).
877  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1653)</sup>
878  * Casting only changes the values array and the type. The row and column
879  index arrays are not changed.
880 * Reintroduced automated computation of chart axes limits for graphics functions.
881  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1639)</sup>
882  * The axes limits will always be the minimum/maximum of the current and new
883  limit.
884  * The user can still set limits from API calls. If the user sets a limit
885  from the API call, then the automatic limit setting will be disabled.
886 * Using `boost::scoped_array` instead of `boost::scoped_ptr` when managing
887  array resources.
888  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1637)</sup>
889 * Internal performance improvements to getInfo() by using `const` references
890  to avoid unnecessary copying of `ArrayInfo` objects.
891  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1665)</sup>
892 * Added support for scalar af::array inputs for af::convolve() and
893  [set functions](\ref set_mat).
894  <sup>[1](https://github.com/arrayfire/arrayfire/issues/1660)</sup>
895  <sup>[2](https://github.com/arrayfire/arrayfire/issues/1675)</sup>
896  <sup>[3](https://github.com/arrayfire/arrayfire/pull/1668)</sup>
897 * Performance fixes in af::fftConvolve() kernels.
898  <sup>[1](https://github.com/arrayfire/arrayfire/issues/1679)</sup>
899  <sup>[2](https://github.com/arrayfire/arrayfire/pull/1680)</sup>
900 
901 Build
902 -----
903 * Support for Visual Studio 2015 compilation.
904  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1632)</sup>
905  <sup>[2](https://github.com/arrayfire/arrayfire/pull/1640)</sup>
906 * Fixed `FindCBLAS.cmake` when PkgConfig is used.
907  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1657)</sup>
908 
909 Bug fixes
910 ---------
911 * Fixes to JIT when tree is large.
912  <sup>[1](https://github.com/arrayfire/arrayfire/issues/1646)</sup>
913  <sup>[2](https://github.com/arrayfire/arrayfire/pull/1638)</sup>
914 * Fixed indexing bug when converting dense to sparse af::array as \ref
915  AF_STORAGE_COO.
916  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1642)</sup>
917 * Fixed af::bilateral() OpenCL kernel compilation on OS X.
918  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1638)</sup>
919 * Fixed memory leak in af::regions() (CPU) and af::rgb2ycbcr().
920  <sup>[1](https://github.com/arrayfire/arrayfire/issues/1664)</sup>
921  <sup>[2](https://github.com/arrayfire/arrayfire/issues/1664)</sup>
922  <sup>[3](https://github.com/arrayfire/arrayfire/pull/1666)</sup>
923 
924 Installers
925 ----------
926 * Major OS X installer fixes.
927  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1629)</sup>
928  * Fixed installation scripts.
929  * Fixed installation symlinks for libraries.
930 * Windows installer now ships with more pre-built examples.
931 
932 Examples
933 --------
934 * Added af::choleskyInPlace() calls to `cholesky.cpp` example.
935  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1671)</sup>
936 
937 Documentation
938 -------------
939 * Added `u8` as supported data type in `getting_started.md`.
940  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1661)</sup>
941 * Fixed typos.
942  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1652)</sup>
943 
944 CUDA 8 on OSX
945 -------------
946 * [CUDA 8.0.55](https://developer.nvidia.com/cuda-toolkit) supports Xcode 8.
947  <sup>[1](https://github.com/arrayfire/arrayfire/issues/1664)</sup>
948 
949 Known Issues
950 ------------
951 * Known failures with CUDA 6.5. These include all functions that use
952  sorting. As a result, sparse storage format conversion between \ref
953  AF_STORAGE_COO and \ref AF_STORAGE_CSR has been disabled for CUDA 6.5.
954 
955 v3.4.1
956 ==============
957 
958 Installers
959 ----------
960 * Installers for Linux, OS X and Windows
961  * CUDA backend now uses [CUDA 8.0](https://developer.nvidia.com/cuda-toolkit).
962  * Uses [Intel MKL 2017](https://software.intel.com/en-us/intel-mkl).
963  * CUDA Compute 2.x (Fermi) is no longer compiled into the library.
964 * Installer for OS X
965  * The libraries shipping in the OS X Installer are now compiled with Apple
966  Clang v7.3.1 (previously v6.1.0).
967  * The OS X version used is 10.11.6 (previously 10.10.5).
968 * Installer for Jetson TX1 / Tegra X1
969  * Requires [JetPack for L4T 2.3](https://developer.nvidia.com/embedded/jetpack)
970  (containing Linux for Tegra r24.2 for TX1).
971  * CUDA backend now uses [CUDA 8.0](https://developer.nvidia.com/cuda-toolkit) 64-bit.
972  * Using CUDA's cusolver instead of CPU fallback.
973  * Uses OpenBLAS for CPU BLAS.
974  * All ArrayFire libraries are now 64-bit.
975 
976 Improvements
977 ------------
978 * Add [sparse array](\ref sparse_func) support to \ref af::eval().
979  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1598)</sup>
980 * Add OpenCL-CPU fallback support for sparse \ref af::matmul() when running on
981  a unified memory device. Uses MKL Sparse BLAS.
982 * When using CUDA libdevice, pick the correct compute version based on device.
983  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1612)</sup>
984 * OpenCL FFT now also supports prime factors 7, 11 and 13.
985  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1383)</sup>
986  <sup>[2](https://github.com/arrayfire/arrayfire/pull/1619)</sup>
987 
988 Bug Fixes
989 ---------
990 * Allow CUDA libdevice to be detected from custom directory.
991 * Fix `aarch64` detection on Jetson TX1 64-bit OS.
992  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1593)</sup>
993 * Add missing definition of `af_set_fft_plan_cache_size` in unified backend.
994  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1591)</sup>
995 * Fix intial values for \ref af::min() and \ref af::max() operations.
996  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1594)</sup>
997  <sup>[2](https://github.com/arrayfire/arrayfire/pull/1595)</sup>
998 * Fix distance calculation in \ref af::nearestNeighbour for CUDA and OpenCL backend.
999  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1596)</sup>
1000  <sup>[2](https://github.com/arrayfire/arrayfire/pull/1595)</sup>
1001 * Fix OpenCL bug where scalars where are passed incorrectly to compile options.
1002  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1595)</sup>
1003 * Fix bug in \ref af::Window::surface() with respect to dimensions and ranges.
1004  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1604)</sup>
1005 * Fix possible double free corruption in \ref af_assign_seq().
1006  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1605)</sup>
1007 * Add missing eval for key in \ref af::scanByKey in CPU backend.
1008  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1605)</sup>
1009 * Fixed creation of sparse values array using \ref AF_STORAGE_COO.
1010  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1620)</sup>
1011  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1621)</sup>
1012 
1013 Examples
1014 --------
1015 * Add a [Conjugate Gradient solver example](\ref benchmarks/cg.cpp)
1016  to demonstrate sparse and dense matrix operations.
1017  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1599)</sup>
1018 
1019 CUDA Backend
1020 ------------
1021 * When using [CUDA 8.0](https://developer.nvidia.com/cuda-toolkit),
1022  compute 2.x are no longer in default compute list.
1023  * This follows [CUDA 8.0](https://developer.nvidia.com/cuda-toolkit)
1024  deprecating computes 2.x.
1025  * Default computes for CUDA 8.0 will be 30, 50, 60.
1026 * When using CUDA pre-8.0, the default selection remains 20, 30, 50.
1027 * CUDA backend now uses `-arch=sm_30` for PTX compilation as default.
1028  * Unless compute 2.0 is enabled.
1029 
1030 Known Issues
1031 ------------
1032 * \ref af::lu() on CPU is known to give incorrect results when built run on
1033  OS X 10.11 or 10.12 and compiled with Accelerate Framework.
1034  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1617)</sup>
1035  * Since the OS X Installer libraries uses MKL rather than Accelerate
1036  Framework, this issue does not affect those libraries.
1037 
1038 
1039 v3.4.0
1040 ==============
1041 
1042 Major Updates
1043 -------------
1044 * [Sparse Matrix and BLAS](\ref sparse_func). <sup>[1](https://github.com/arrayfire/arrayfire/issues/821)
1045  [2](https://github.com/arrayfire/arrayfire/pull/1319)</sup>
1046 * Faster JIT for CUDA and OpenCL. <sup>[1](https://github.com/arrayfire/arrayfire/issues/1472)
1047  [2](https://github.com/arrayfire/arrayfire/pull/1462)</sup>
1048 * Support for [random number generator engines](\ref af::randomEngine).
1049  <sup>[1](https://github.com/arrayfire/arrayfire/issues/868)
1050  [2](https://github.com/arrayfire/arrayfire/pull/1551)</sup>
1051 * Improvements to graphics. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1555)
1052  [2](https://github.com/arrayfire/arrayfire/pull/1566)</sup>
1053 
1054 Features
1055 ----------
1056 * **[Sparse Matrix and BLAS](\ref sparse_func)** <sup>[1](https://github.com/arrayfire/arrayfire/issues/821)
1057 [2](https://github.com/arrayfire/arrayfire/pull/1319)</sup>
1058  * Support for [CSR](\ref AF_STORAGE_CSR) and [COO](\ref AF_STORAGE_COO)
1059  [storage types](\ref af_storage).
1060  * Sparse-Dense Matrix Multiplication and Matrix-Vector Multiplication as a
1061  part of af::matmul() using \ref AF_STORAGE_CSR format for sparse.
1062  * Conversion to and from [dense](\ref AF_STORAGE_DENSE) matrix to [CSR](\ref AF_STORAGE_CSR)
1063  and [COO](\ref AF_STORAGE_COO) [storage types](\ref af_storage).
1064 * **Faster JIT** <sup>[1](https://github.com/arrayfire/arrayfire/issues/1472)
1065  [2](https://github.com/arrayfire/arrayfire/pull/1462)</sup>
1066  * Performance improvements for CUDA and OpenCL JIT functions.
1067  * Support for evaluating multiple outputs in a single kernel. See af::array::eval() for more.
1068 * **[Random Number Generation](\ref af::randomEngine)**
1069  <sup>[1](https://github.com/arrayfire/arrayfire/issues/868)
1070  [2](https://github.com/arrayfire/arrayfire/pull/1551)</sup>
1071  * af::randomEngine(): A random engine class to handle setting the [type](af_random_type) and seed
1072  for random number generator engines.
1073  * Supported engine types are (\ref af_random_engine_type):
1074  * [Philox](http://www.thesalmons.org/john/random123/)
1075  * [Threefry](http://www.thesalmons.org/john/random123/)
1076  * [Mersenne Twister](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MTGP/)
1077 * **Graphics** <sup>[1](https://github.com/arrayfire/arrayfire/pull/1555)
1078  [2](https://github.com/arrayfire/arrayfire/pull/1566)</sup>
1079  * Using [Forge v0.9.0](https://github.com/arrayfire/forge/releases/tag/v0.9.0)
1080  * [Vector Field](\ref af::Window::vectorField) plotting functionality.
1081  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1566)</sup>
1082  * Removed [GLEW](http://glew.sourceforge.net/) and replaced with [glbinding](https://github.com/cginternals/glbinding).
1083  * Removed usage of GLEW after support for MX (multithreaded) was dropped in v2.0.
1084  <sup>[1](https://github.com/arrayfire/arrayfire/issues/1540)</sup>
1085  * Multiple overlays on the same window are now possible.
1086  * Overlays support for same type of object (2D/3D)
1087  * Supported by af::Window::plot, af::Window::hist, af::Window::surface,
1088  af::Window::vectorField.
1089  * New API to set axes limits for graphs.
1090  * Draw calls do not automatically compute the limits. This is now under user control.
1091  * af::Window::setAxesLimits can be used to set axes limits automatically or manually.
1092  * af::Window::setAxesTitles can be used to set axes titles.
1093  * New API for plot and scatter:
1094  * af::Window::plot() and af::Window::scatter() now can handle 2D and 3D and determine appropriate order.
1095  * af_draw_plot_nd()
1096  * af_draw_plot_2d()
1097  * af_draw_plot_3d()
1098  * af_draw_scatter_nd()
1099  * af_draw_scatter_2d()
1100  * af_draw_scatter_3d()
1101 * **New [interpolation methods](\ref af_interp_type)**
1102 <sup>[1](https://github.com/arrayfire/arrayfire/issues/1562)</sup>
1103  * Applies to
1104  * \ref af::resize()
1105  * \ref af::transform()
1106  * \ref af::approx1()
1107  * \ref af::approx2()
1108 * **Support for [complex mathematical functions](\ref mathfunc_mat)**
1109  <sup>[1](https://github.com/arrayfire/arrayfire/issues/1507)</sup>
1110  * Add complex support for \ref trig_mat, \ref af::sqrt(), \ref af::log().
1111 * **af::medfilt1(): Median filter for 1-d signals** <sup>[1](https://github.com/arrayfire/arrayfire/pull/1479)</sup>
1112 * <b>Generalized scan functions: \ref scan_func_scan and \ref scan_func_scanbykey</b>
1113  * Now supports inclusive or exclusive scans
1114  * Supports binary operations defined by \ref af_binary_op.
1115  <sup>[1](https://github.com/arrayfire/arrayfire/issues/388)</sup>
1116 * **[Image Moments](\ref moments_mat) functions**
1117  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1453)</sup>
1118 * <b>Add af::getSizeOf() function for \ref af_dtype</b>
1119  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1404)</sup>
1120 * <b>Explicitly extantiate \ref af::array::device() for `void *</b>
1121  <sup>[1](https://github.com/arrayfire/arrayfire/issues/1503)</sup>
1122 
1123 Bug Fixes
1124 --------------
1125 * Fixes to edge-cases in \ref morph_mat. <sup>[1](https://github.com/arrayfire/arrayfire/issues/1564)</sup>
1126 * Makes JIT tree size consistent between devices. <sup>[1](https://github.com/arrayfire/arrayfire/issues/1457)</sup>
1127 * Delegate higher-dimension in \ref convolve_mat to correct dimensions. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1445)</sup>
1128 * Indexing fixes with C++11. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1426) [2](https://github.com/arrayfire/arrayfire/pull/1426)</sup>
1129 * Handle empty arrays as inputs in various functions. <sup>[1](https://github.com/arrayfire/arrayfire/issues/799)</sup>
1130 * Fix bug when single element input to af::median. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1423)</sup>
1131 * Fix bug in calculation of time from af::timeit(). <sup>[1](https://github.com/arrayfire/arrayfire/pull/1414)</sup>
1132 * Fix bug in floating point numbers in af::seq. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1404)</sup>
1133 * Fixes for OpenCL graphics interop on NVIDIA devices.
1134  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1408/commits/e1f16e6)</sup>
1135 * Fix bug when compiling large kernels for AMD devices.
1136  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1465)</sup>
1137 * Fix bug in af::bilateral when shared memory is over the limit.
1138  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1478)</sup>
1139 * Fix bug in kernel header compilation tool `bin2cpp`.
1140  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1544)</sup>
1141 * Fix inital values for \ref morph_mat functions.
1142  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1547)</sup>
1143 * Fix bugs in af::homography() CPU and OpenCL kernels.
1144  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1584)</sup>
1145 * Fix bug in CPU TNJ.
1146  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1587)</sup>
1147 
1148 
1149 Improvements
1150 ------------
1151 * CUDA 8 and compute 6.x(Pascal) support, current installer ships with CUDA 7.5. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1432) [2](https://github.com/arrayfire/arrayfire/pull/1487) [3](https://github.com/arrayfire/arrayfire/pull/1539)</sup>
1152 * User controlled FFT plan caching. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1448)</sup>
1153 * CUDA performance improvements for \ref image_func_wrap, \ref image_func_unwrap and \ref approx_mat.
1154  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1411)</sup>
1155 * Fallback for CUDA-OpenGL interop when no devices does not support OpenGL.
1156  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1415)</sup>
1157 * Additional forms of batching with the \ref transform_func_transform functions.
1158  [New behavior defined here](https://github.com/arrayfire/arrayfire/pull/1412).
1159  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1412)</sup>
1160 * Update to OpenCL2 headers. <sup>[1](https://github.com/arrayfire/arrayfire/issues/1344)</sup>
1161 * Support for integration with external OpenCL contexts. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1140)</sup>
1162 * Performance improvements to interal copy in CPU Backend.
1163  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1440)</sup>
1164 * Performance improvements to af::select and af::replace CUDA kernels.
1165  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1587)</sup>
1166 * Enable OpenCL-CPU offload by default for devices with Unified Host Memory.
1167  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1521)</sup>
1168  * To disable, use the environment variable `AF_OPENCL_CPU_OFFLOAD=0`.
1169 
1170 Build
1171 ------
1172 * Compilation speedups. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1526)</sup>
1173 * Build fixes with MKL. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1526)</sup>
1174 * Error message when CMake CUDA Compute Detection fails. <sup>[1](https://github.com/arrayfire/arrayfire/issues/1535)</sup>
1175 * Several CMake build issues with Xcode generator fixed.
1176  <sup>[1](https://github.com/arrayfire/arrayfire/pull/1493) [2](https://github.com/arrayfire/arrayfire/pull/1499)</sup>
1177 * Fix multiple OpenCL definitions at link time. <sup>[1](https://github.com/arrayfire/arrayfire/issues/1429)</sup>
1178 * Fix lapacke detection in CMake. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1423)</sup>
1179 * Update build tags of
1180  * [clBLAS](https://github.com/clMathLibraries/clBLAS)
1181  * [clFFT](https://github.com/clMathLibraries/clFFT)
1182  * [Boost.Compute](https://github.com/boostorg/compute)
1183  * [Forge](https://github.com/arrayfire/forge)
1184  * [glbinding](https://github.com/cginternals/glbinding)
1185 * Fix builds with GCC 6.1.1 and GCC 5.3.0. <sup>[1](https://github.com/arrayfire/arrayfire/pull/1409)</sup>
1186 
1187 Installers
1188 ----------
1189 * All installers now ship with ArrayFire libraries build with MKL 2016.
1190 * All installers now ship with Forge development files and examples included.
1191 * CUDA Compute 2.0 has been removed from the installers. Please contact us
1192  directly if you have a special need.
1193 
1194 Examples
1195 -------------
1196 * Added [example simulating gravity](\ref graphics/field.cpp) for
1197  demonstration of vector field.
1198 * Improvements to \ref financial/black_scholes_options.cpp example.
1199 * Improvements to \ref graphics/gravity_sim.cpp example.
1200 * Fix graphics examples to use af::Window::setAxesLimits and
1201  af::Window::setAxesTitles functions.
1202 
1203 Documentation & Licensing
1204 -------------------------
1205 * [ArrayFire copyright and trademark policy](http://arrayfire.com/trademark-policy)
1206 * Fixed grammar in license.
1207 * Add license information for glbinding.
1208 * Remove license infomation for GLEW.
1209 * Random123 now applies to all backends.
1210 * Random number functions are now under \ref random_mat.
1211 
1212 Deprecations
1213 ------------
1214 The following functions have been deprecated and may be modified or removed
1215 permanently from future versions of ArrayFire.
1216 * \ref af::Window::plot3(): Use \ref af::Window::plot instead.
1217 * \ref af_draw_plot(): Use \ref af_draw_plot_nd or \ref af_draw_plot_2d instead.
1218 * \ref af_draw_plot3(): Use \ref af_draw_plot_nd or \ref af_draw_plot_3d instead.
1219 * \ref af::Window::scatter3(): Use \ref af::Window::scatter instead.
1220 * \ref af_draw_scatter(): Use \ref af_draw_scatter_nd or \ref af_draw_scatter_2d instead.
1221 * \ref af_draw_scatter3(): Use \ref af_draw_scatter_nd or \ref af_draw_scatter_3d instead.
1222 
1223 Known Issues
1224 -------------
1225 Certain CUDA functions are known to be broken on Tegra K1. The following ArrayFire tests are currently failing:
1226 * assign_cuda
1227 * harris_cuda
1228 * homography_cuda
1229 * median_cuda
1230 * orb_cudasort_cuda
1231 * sort_by_key_cuda
1232 * sort_index_cuda
1233 
1234 
1235 v3.3.2
1236 ==============
1237 
1238 Improvements
1239 ------------
1240 * Family of [Sort](\ref sort_mat) functions now support
1241  [higher order dimensions](https://github.com/arrayfire/arrayfire/pull/1373).
1242 * Improved performance of batched sort on dim 0 for all [Sort](\ref sort_mat) functions.
1243 * [Median](\ref stat_func_median) now also supports higher order dimensions.
1244 
1245 Bug Fixes
1246 --------------
1247 
1248 * Fixes to [error handling](https://github.com/arrayfire/arrayfire/issues/1352) in C++ API for binary functions.
1249 * Fixes to [external OpenCL context management](https://github.com/arrayfire/arrayfire/issues/1350).
1250 * Fixes to [JPEG_GREYSCALE](https://github.com/arrayfire/arrayfire/issues/1360) for FreeImage versions <= 3.154.
1251 * Fixed for [non-float inputs](https://github.com/arrayfire/arrayfire/issues/1386) to \ref af::rgb2gray().
1252 
1253 Build
1254 ------
1255 * [Disable CPU Async](https://github.com/arrayfire/arrayfire/issues/1378) when building with GCC < 4.8.4.
1256 * Add option to [disable CPUID](https://github.com/arrayfire/arrayfire/issues/1369) from CMake.
1257 * More verbose message when [CUDA Compute Detection fails](https://github.com/arrayfire/arrayfire/issues/1362).
1258 * Print message to use [CUDA library stub](https://github.com/arrayfire/arrayfire/issues/1363)
1259  from CUDA Toolkit if CUDA Library is not found from default paths.
1260 * [Build Fixes](https://github.com/arrayfire/arrayfire/pull/1385) on Windows.
1261  * For compiling tests our of source.
1262  * For compiling ArrayFire with static MKL.
1263 * [Exclude <sys/sysctl.h>](https://github.com/arrayfire/arrayfire/pull/1368) when building on GNU Hurd.
1264 * Add [manual CMake options](https://github.com/arrayfire/arrayfire/pull/1389) to build DEB and RPM packages.
1265 
1266 Documentation
1267 -------------
1268 * Fixed documentation for \ref af::replace().
1269 * Fixed images in [Using on OSX](\ref using_on_osx) page.
1270 
1271 Installer
1272 ---------
1273 * Linux x64 installers will now be compiled with GCC 4.9.2.
1274 * OSX installer gives better error messages on brew failures and
1275  now includes link to [Fixing OS X Installer Failures] (https://github.com/arrayfire/arrayfire/wiki/Fixing-Common-OS-X-Installer-Failures)
1276  for brew installation failures.
1277 
1278 v3.3.1
1279 ==============
1280 
1281 Bug Fixes
1282 --------------
1283 
1284 * Fixes to \ref af::array::device()
1285  * CPU Backend: [evaluate arrays](https://github.com/arrayfire/arrayfire/issues/1316)
1286  before returning pointer with asynchronous calls in CPU backend.
1287  * OpenCL Backend: [fix segfaults](https://github.com/arrayfire/arrayfire/issues/1324)
1288  when requested for device pointers on empty arrays.
1289 * Fixed \ref af::operator%() from using [rem to mod](https://github.com/arrayfire/arrayfire/issues/1318).
1290 * Fixed [array destruction](https://github.com/arrayfire/arrayfire/issues/1321)
1291  when backends are switched in Unified API.
1292 * Fixed [indexing](https://github.com/arrayfire/arrayfire/issues/1331) after
1293  \ref af::moddims() is called.
1294 * Fixes FFT calls for CUDA and OpenCL backends when used on
1295  [multiple devices](https://github.com/arrayfire/arrayfire/issues/1332).
1296 * Fixed [unresolved external](https://github.com/arrayfire/arrayfire/commit/32965ef)
1297  for some functions from \ref af::array::array_proxy class.
1298 
1299 Build
1300 ------
1301 * CMake compiles files in alphabetical order.
1302 * CMake fixes for BLAS and LAPACK on some Linux distributions.
1303 
1304 Improvements
1305 ------------
1306 * Fixed [OpenCL FFT performance](https://github.com/arrayfire/arrayfire/issues/1323) regression.
1307 * \ref af::array::device() on OpenCL backend [returns](https://github.com/arrayfire/arrayfire/issues/1311)
1308  `cl_mem` instead of `(void*)cl::Buffer*`.
1309 * In Unified backend, [load versioned libraries](https://github.com/arrayfire/arrayfire/issues/1312)
1310  at runtime.
1311 
1312 Documentation
1313 ------
1314 * Reorganized, cleaner README file.
1315 * Replaced non-free lena image in assets with free-to-distribute lena image.
1316 
1317 v3.3.0
1318 ==============
1319 
1320 Major Updates
1321 -------------
1322 
1323 * CPU backend supports aysnchronous execution.
1324 * Performance improvements to OpenCL BLAS and FFT functions.
1325 * Improved performance of memory manager.
1326 * Improvements to visualization functions.
1327 * Improved sorted order for OpenCL devices.
1328 * Integration with external OpenCL projects.
1329 
1330 Features
1331 ----------
1332 
1333 * \ref af::getActiveBackend(): Returns the current backend being used.
1334 * [Scatter plot](https://github.com/arrayfire/arrayfire/pull/1116) added to graphics.
1335 * \ref af::transform() now supports perspective transformation matrices.
1336 * \ref af::infoString(): Returns `af::info()` as a string.
1337 * \ref af::printMemInfo(): Print a table showing information about buffer from the memory manager
1338  * The \ref AF_MEM_INFO macro prints numbers and total sizes of all buffers (requires including af/macros.h)
1339 * \ref af::allocHost(): Allocates memory on host.
1340 * \ref af::freeHost(): Frees host side memory allocated by arrayfire.
1341 * OpenCL functions can now use CPU implementation.
1342  * Currently limited to Unified Memory devices (CPU and On-board Graphics).
1343  * Functions: af::matmul() and all [LAPACK](\ref linalg_mat) functions.
1344  * Takes advantage of optimized libraries such as MKL without doing memory copies.
1345  * Use the environment variable `AF_OPENCL_CPU_OFFLOAD=1` to take advantage of this feature.
1346 * Functions specific to OpenCL backend.
1347  * \ref afcl::addDevice(): Adds an external device and context to ArrayFire's device manager.
1348  * \ref afcl::deleteDevice(): Removes an external device and context from ArrayFire's device manager.
1349  * \ref afcl::setDevice(): Sets an external device and context from ArrayFire's device manager.
1350  * \ref afcl::getDeviceType(): Gets the device type of the current device.
1351  * \ref afcl::getPlatform(): Gets the platform of the current device.
1352 * \ref af::createStridedArray() allows [array creation user-defined strides](https://github.com/arrayfire/arrayfire/issues/1177) and device pointer.
1353 * [Expose functions](https://github.com/arrayfire/arrayfire/issues/1131) that provide information
1354  about memory layout of Arrays.
1355  * \ref af::getStrides(): Gets the strides for each dimension of the array.
1356  * \ref af::getOffset(): Gets the offsets for each dimension of the array.
1357  * \ref af::getRawPtr(): Gets raw pointer to the location of the array on device.
1358  * \ref af::isLinear(): Returns true if all elements in the array are contiguous.
1359  * \ref af::isOwner(): Returns true if the array owns the raw pointer, false if it is a sub-array.
1360  * \ref af::getStrides(): Gets the strides of the array.
1361  * \ref af::getStrides(): Gets the strides of the array.
1362 * \ref af::getDeviceId(): Gets the device id on which the array resides.
1363 * \ref af::isImageIOAvailable(): Returns true if ArrayFire was compiled with Freeimage enabled
1364 * \ref af::isLAPACKAvailable(): Returns true if ArrayFire was compiled with LAPACK functions enabled
1365 
1366 Bug Fixes
1367 --------------
1368 
1369 * Fixed [errors when using 3D / 4D arrays](https://github.com/arrayfire/arrayfire/pull/1251) in select and replace
1370 * Fixed [JIT errors on AMD devices](https://github.com/arrayfire/arrayfire/pull/1238) for OpenCL backend.
1371 * Fixed [imageio bugs](https://github.com/arrayfire/arrayfire/pull/1229) for 16 bit images.
1372 * Fixed [bugs when loading and storing images](https://github.com/arrayfire/arrayfire/pull/1228) natively.
1373 * Fixed [bug in FFT for NVIDIA GPUs](https://github.com/arrayfire/arrayfire/issues/615) when using OpenCL backend.
1374 * Fixed [bug when using external context](https://github.com/arrayfire/arrayfire/pull/1241) with OpenCL backend.
1375 * Fixed [memory leak](https://github.com/arrayfire/arrayfire/issues/1269) in \ref af_median_all().
1376 * Fixed [memory leaks and performance](https://github.com/arrayfire/arrayfire/pull/1274) in graphics functions.
1377 * Fixed [bugs when indexing followed by moddims](https://github.com/arrayfire/arrayfire/issues/1275).
1378 * \ref af_get_revision() now returns actual commit rather than AF_REVISION.
1379 * Fixed [releasing arrays](https://github.com/arrayfire/arrayfire/issues/1282) when using different backends.
1380 * OS X OpenCL: [LAPACK functions](\ref linalg_mat) on CPU devices use OpenCL offload (previously threw errors).
1381 * [Add support for 32-bit integer image types](https://github.com/arrayfire/arrayfire/pull/1287) in Image IO.
1382 * Fixed [set operations for row vectors](https://github.com/arrayfire/arrayfire/issues/1300)
1383 * Fixed [bugs](https://github.com/arrayfire/arrayfire/issues/1243) in \ref af::meanShift() and af::orb().
1384 
1385 Improvements
1386 --------------
1387 
1388 * Optionally [offload BLAS and LAPACK](https://github.com/arrayfire/arrayfire/pull/1221) functions to CPU implementations to improve performance.
1389 * Performance improvements to the memory manager.
1390 * Error messages are now more detailed.
1391 * Improved sorted order for OpenCL devices.
1392 * JIT heuristics can now be tweaked using environment variables. See
1393  [Environment Variables](\ref configuring_environment) tutorial.
1394 * Add `BUILD_<BACKEND>` [options to examples and tests](https://github.com/arrayfire/arrayfire/issues/1286)
1395  to toggle backends when compiling independently.
1396 
1397 Examples
1398 ----------
1399 
1400 * New visualization [example simulating gravity](\ref graphics/gravity_sim.cpp).
1401 
1402 Build
1403 ----------
1404 
1405 * Support for Intel `icc` compiler
1406 * Support to compile with Intel MKL as a BLAS and LAPACK provider
1407 * Tests are now available for building as standalone (like examples)
1408 * Tests can now be built as a single file for each backend
1409 * Better handling of NONFREE build options
1410 * [Searching for GLEW in CMake default paths](https://github.com/arrayfire/arrayfire/pull/1292)
1411 * Fixes for compiling with MKL on OSX.
1412 
1413 Installers
1414 ----------
1415 * Improvements to OSX Installer
1416  * CMake config files are now installed with libraries
1417  * Independent options for installing examples and documentation components
1418 
1419 Deprecations
1420 -----------
1421 
1422 * `af_lock_device_arr` is now deprecated to be removed in v4.0.0. Use \ref af_lock_array() instead.
1423 * `af_unlock_device_arr` is now deprecated to be removed in v4.0.0. use \ref af_unlock_array() instead.
1424 
1425 Documentation
1426 --------------
1427 
1428 * Fixes to documentation for \ref af::matchTemplate().
1429 * Improved documentation for deviceInfo.
1430 * Fixes to documentation for \ref af::exp().
1431 
1432 Known Issues
1433 ------------
1434 
1435 * [Solve OpenCL fails on NVIDIA Maxwell devices](https://github.com/arrayfire/arrayfire/issues/1246)
1436  for f32 and c32 when M > N and K % 4 is 1 or 2.
1437 
1438 
1439 v3.2.2
1440 ==============
1441 
1442 Bug Fixes
1443 --------------
1444 
1445 * Fixed [memory leak](https://github.com/arrayfire/arrayfire/pull/1145) in
1446  CUDA Random number generators
1447 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/1157) in
1448  af::select() and af::replace() tests
1449 * Fixed [exception](https://github.com/arrayfire/arrayfire/issues/1164)
1450  thrown when printing empty arrays with af::print()
1451 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/1170) in CPU
1452  random number generation. Changed the generator to
1453  [mt19937](http://en.cppreference.com/w/cpp/numeric/random)
1454 * Fixed exception handling (internal)
1455  * [Exceptions](https://github.com/arrayfire/arrayfire/issues/1188)
1456  now show function, short file name and line number
1457  * Added [AF_RETURN_ERROR](https://github.com/arrayfire/arrayfire/issues/1186)
1458  macro to handle returning errors.
1459  * Removed THROW macro, and renamed AF_THROW_MSG to AF_THROW_ERR.
1460 * Fixed [bug](https://github.com/arrayfire/arrayfire/commit/9459c6)
1461  in \ref af::identity() that may have affected CUDA Compute 5.2 cards
1462 
1463 
1464 Build
1465 ------
1466 * Added a [MIN_BUILD_TIME](https://github.com/arrayfire/arrayfire/issues/1193)
1467  option to build with minimum optimization compiler flags resulting in faster
1468  compile times
1469 * Fixed [issue](https://github.com/arrayfire/arrayfire/issues/1143) in CBLAS
1470  detection by CMake
1471 * Fixed tests failing for builds without optional components
1472  [FreeImage](https://github.com/arrayfire/arrayfire/issues/1143) and
1473  [LAPACK](https://github.com/arrayfire/arrayfire/issues/1167)
1474 * Added a [test](https://github.com/arrayfire/arrayfire/issues/1192)
1475  for unified backend
1476 * Only [info and backend tests](https://github.com/arrayfire/arrayfire/issues/1192)
1477  are now built for unified backend
1478 * [Sort tests](https://github.com/arrayfire/arrayfire/issues/1199)
1479  execution alphabetically
1480 * Fixed compilation flags and errors in tests and examples
1481 * [Moved AF_REVISION and AF_COMPILER_STR](https://github.com/arrayfire/arrayfire/commit/2287c5)
1482  into src/backend. This is because as revision is updated with every commit,
1483  entire ArrayFire would have to be rebuilt in the old code.
1484  * v3.3 will add a af_get_revision() function to get the revision string.
1485 * [Clean up examples](https://github.com/arrayfire/arrayfire/pull/1158)
1486  * Remove getchar for Windows (this will be handled by the installer)
1487  * Other miscellaneous code cleanup
1488  * Fixed bug in [plot3.cpp](\ref graphics/plot3.cpp) example
1489 * [Rename](https://github.com/arrayfire/arrayfire/commit/35f0fc2) clBLAS/clFFT
1490  external project suffix from external -> ext
1491 * [Add OpenBLAS](https://github.com/arrayfire/arrayfire/pull/1197) as a
1492  lapack/lapacke alternative
1493 
1494 Improvements
1495 ------------
1496 * Added \ref AF_MEM_INFO macro to print memory info from ArrayFire's memory
1497  manager ([cross issue](https://github.com/arrayfire/arrayfire/issues/1172))
1498 * Added [additional paths](https://github.com/arrayfire/arrayfire/issues/1184)
1499  for searching for `libaf*` for Unified backend on unix-style OS.
1500  * Note: This still requires dependencies such as forge, CUDA, NVVM etc to be
1501  in `LD_LIBRARY_PATH` as described in [Unified Backend](\ref unifiedbackend)
1502 * [Create streams](https://github.com/arrayfire/arrayfire/commit/ed0373f)
1503  for devices only when required in CUDA Backend
1504 
1505 Documentation
1506 ------
1507 * [Hide scrollbars](https://github.com/arrayfire/arrayfire/commit/9d218a5)
1508  appearing for pre and code styles
1509 * Fix [documentation](https://github.com/arrayfire/arrayfire/commit/ac09f91) for af::replace
1510 * Add [code sample](https://github.com/arrayfire/arrayfire/commit/4e06483)
1511  for converting the output of af::getAvailableBackends() into bools
1512 * Minor fixes in documentation
1513 
1514 v3.2.1
1515 ==============
1516 
1517 Bug Fixes
1518 --------------
1519 
1520 * Fixed [bug](https://github.com/arrayfire/arrayfire/pull/1136) in homography()
1521 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/1135) in behavior
1522  of af::array::device()
1523 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/1129) when
1524  indexing with span along trailing dimension
1525 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/1127) when
1526  indexing in [GFor](\ref gfor)
1527 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/1122) in CPU
1528  information fetching
1529 * Fixed compilation [bug](https://github.com/arrayfire/arrayfire/issues/1117)
1530  in unified backend caused by missing link library
1531 * Add [missing symbol](https://github.com/arrayfire/arrayfire/pull/1114) for
1532  af_draw_surface()
1533 
1534 Build
1535 ------
1536 * Tests can now be used as a [standalone project](https://github.com/arrayfire/arrayfire/pull/1120)
1537  * Tests can now be built using pre-compiled libraries
1538  * Similar to how the examples are built
1539 * The install target now installs the examples source irrespective of the
1540  BUILD_EXAMPLES value
1541  * Examples are not built if BUILD_EXAMPLES is off
1542 
1543 Documentation
1544 ------
1545 * HTML documentation is now [built and installed](https://github.com/arrayfire/arrayfire/pull/1109)
1546  in docs/html
1547 * Added documentation for \ref af::seq class
1548 * Updated [Matrix Manipulation](\ref matrixmanipulation) tutorial
1549 * Examples list is now generated by CMake
1550  * <a href="examples.htm">Examples</a> are now listed as dir/example.cpp
1551 * Removed dummy groups used for indexing documentation (affcted doxygen < 1.8.9)
1552 
1553 v3.2.0
1554 =================
1555 
1556 Major Updates
1557 -------------
1558 
1559 * Added Unified backend
1560  * Allows switching backends at runtime
1561  * Read [Unified Backend](\ref unifiedbackend) for more.
1562 * Support for 16-bit integers (\ref s16 and \ref u16)
1563  * All functions that support 32-bit interger types (\ref s32, \ref u32),
1564  now also support 16-bit interger types
1565 
1566 Function Additions
1567 ------------------
1568 * Unified Backend
1569  * \ref af::setBackend() - Sets a backend as active
1570  * \ref af::getBackendCount() - Gets the number of backends available for use
1571  * \ref af::getAvailableBackends() - Returns information about available backends
1572  * \ref af::getBackendId() - Gets the backend enum for an array
1573 
1574 * Vision
1575  * \ref af::homography() - Homography estimation
1576  * \ref af::gloh() - GLOH Descriptor for SIFT
1577 
1578 * Image Processing
1579  * \ref af::loadImageNative() - Load an image as native data without modification
1580  * \ref af::saveImageNative() - Save an image without modifying data or type
1581 
1582 * Graphics
1583  * \ref af::Window::plot3() - 3-dimensional line plot
1584  * \ref af::Window::surface() - 3-dimensional curve plot
1585 
1586 * Indexing
1587  * \ref af_create_indexers()
1588  * \ref af_set_array_indexer()
1589  * \ref af_set_seq_indexer()
1590  * \ref af_set_seq_param_indexer()
1591  * \ref af_release_indexers()
1592 
1593 * CUDA Backend Specific
1594  * \ref afcu::setNativeId() - Set the CUDA device with given native id as active
1595  * ArrayFire uses a modified order for devices. The native id for a
1596  device can be retreived using `nvidia-smi`
1597 
1598 * OpenCL Backend Specific
1599  * \ref afcl::setDeviceId() - Set the OpenCL device using the `clDeviceId`
1600 
1601 Other Improvements
1602 ------------------------
1603 * Added \ref c32 and \ref c64 support for \ref af::isNaN(), \ref af::isInf() and \ref af::iszero()
1604 * Added CPU information for `x86` and `x86_64` architectures in CPU backend's \ref af::info()
1605 * Batch support for \ref af::approx1() and \ref af::approx2()
1606  * Now can be used with gfor as well
1607 * Added \ref s64 and \ref u64 support to:
1608  * \ref af::sort() (along with sort index and sort by key)
1609  * \ref af::setUnique(), \ref af::setUnion(), \ref af::setIntersect()
1610  * \ref af::convolve() and \ref af::fftConvolve()
1611  * \ref af::histogram() and \ref af::histEqual()
1612  * \ref af::lookup()
1613  * \ref af::mean()
1614 * Added \ref AF_MSG macro
1615 
1616 Build Improvements
1617 ------------------
1618 * Submodules update is now automatically called if not cloned recursively
1619 * [Fixes for compilation](https://github.com/arrayfire/arrayfire/issues/766) on Visual Studio 2015
1620 * Option to use [fallback to CPU LAPACK](https://github.com/arrayfire/arrayfire/pull/1053)
1621  for linear algebra functions in case of CUDA 6.5 or older versions.
1622 
1623 Bug Fixes
1624 --------------
1625 * Fixed [memory leak](https://github.com/arrayfire/arrayfire/pull/1096) in \ref af::susan()
1626 * Fixed [failing test](https://github.com/arrayfire/arrayfire/commit/144a2db)
1627  in \ref af::lower() and \ref af::upper() for CUDA compute 53
1628 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/1092) in CUDA for indexing out of bounds
1629 * Fixed [dims check](https://github.com/arrayfire/arrayfire/commit/6975da8) in \ref af::iota()
1630 * Fixed [out-of-bounds access](https://github.com/arrayfire/arrayfire/commit/7fc3856) in \ref af::sift()
1631 * Fixed [memory allocation](https://github.com/arrayfire/arrayfire/commit/5e88e4a) in \ref af::fast() OpenCL
1632 * Fixed [memory leak](https://github.com/arrayfire/arrayfire/pull/994) in image I/O functions
1633 * \ref af::dog() now returns float-point type arrays
1634 
1635 Documentation Updates
1636 ---------------------
1637 * Improved tutorials documentation
1638  * More detailed Using on [Linux](\ref using_on_linux), [OSX](\ref using_on_osx),
1639  [Windows](\ref using_on_windows) pages.
1640 * Added return type information for functions that return different type
1641  arrays
1642 
1643 New Examples
1644 ------------
1645 * Graphics
1646  * [Plot3](\ref graphics/plot3.cpp)
1647  * [Surface](\ref graphics/surface.cpp)
1648 * [Shallow Water Equation](\ref pde/swe.cpp)
1649 * [Basic](\ref unified/basic.cpp) as a Unified backend example
1650 
1651 Installers
1652 -----------
1653 * All installers now include the Unified backend and corresponding CMake files
1654 * Visual Studio projects include Unified in the Platform Configurations
1655 * Added installer for Jetson TX1
1656 * SIFT and GLOH do not ship with the installers as SIFT is protected by
1657  patents that do not allow commercial distribution without licensing.
1658 
1659 v3.1.3
1660 ==============
1661 
1662 Bug Fixes
1663 ---------
1664 
1665 * Fixed [bugs](https://github.com/arrayfire/arrayfire/issues/1042) in various OpenCL kernels without offset additions
1666 * Remove ARCH_32 and ARCH_64 flags
1667 * Fix [missing symbols](https://github.com/arrayfire/arrayfire/issues/1040) when freeimage is not found
1668 * Use CUDA driver version for Windows
1669 * Improvements to SIFT
1670 * Fixed [memory leak](https://github.com/arrayfire/arrayfire/issues/1045) in median
1671 * Fixes for Windows compilation when not using MKL [#1047](https://github.com/arrayfire/arrayfire/issues/1047)
1672 * Fixed for building without LAPACK
1673 
1674 Other
1675 -------
1676 
1677 * Documentation: Fixed documentation for select and replace
1678 * Documentation: Fixed documentation for af_isnan
1679 
1680 v3.1.2
1681 ==============
1682 
1683 Bug Fixes
1684 ---------
1685 
1686 * Fixed [bug](https://github.com/arrayfire/arrayfire/commit/4698f12) in assign that was causing test to fail
1687 * Fixed bug in convolve. Frequency condition now depends on kernel size only
1688 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/1005) in indexed reductions for complex type in OpenCL backend
1689 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/1006) in kernel name generation in ireduce for OpenCL backend
1690 * Fixed non-linear to linear indices in ireduce
1691 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/1011) in reductions for small arrays
1692 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/1010) in histogram for indexed arrays
1693 * Fixed [compiler error](https://github.com/arrayfire/arrayfire/issues/1015) CPUID for non-compliant devices
1694 * Fixed [failing tests](https://github.com/arrayfire/arrayfire/issues/1008) on i386 platforms
1695 * Add missing AFAPI
1696 
1697 Other
1698 -------
1699 
1700 * Documentation: Added missing examples and other corrections
1701 * Documentation: Fixed warnings in documentation building
1702 * Installers: Send error messages to log file in OSX Installer
1703 
1704 v3.1.1
1705 ==============
1706 
1707 Installers
1708 -----------
1709 
1710 * CUDA backend now depends on CUDA 7.5 toolkit
1711 * OpenCL backend now require OpenCL 1.2 or greater
1712 
1713 Bug Fixes
1714 --------------
1715 
1716 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/981) in reductions after indexing
1717 * Fixed [bug](https://github.com/arrayfire/arrayfire/issues/976) in indexing when using reverse indices
1718 
1719 Build
1720 ------
1721 
1722 * `cmake` now includes `PKG_CONFIG` in the search path for CBLAS and LAPACKE libraries
1723 * [heston_model.cpp](\ref financial/heston_model.cpp) example now builds with the default ArrayFire cmake files after installation
1724 
1725 Other
1726 ------
1727 
1728 * Fixed bug in [image_editing.cpp](\ref image_processing/image_editing.cpp)
1729 
1730 v3.1.0
1731 ==============
1732 
1733 Function Additions
1734 ------------------
1735 * Computer Vision Functions
1736  * \ref af::nearestNeighbour() - Nearest Neighbour with SAD, SSD and SHD distances
1737  * \ref af::harris() - Harris Corner Detector
1738  * \ref af::susan() - Susan Corner Detector
1739  * \ref af::sift() - Scale Invariant Feature Transform (SIFT)
1740  * Method and apparatus for identifying scale invariant features"
1741  "in an image and use of same for locating an object in an image,\" David"
1742  "G. Lowe, US Patent 6,711,293 (March 23, 2004). Provisional application"
1743  "filed March 8, 1999. Asignee: The University of British Columbia. For"
1744  "further details, contact David Lowe (lowe@cs.ubc.ca) or the"
1745  "University-Industry Liaison Office of the University of British"
1746  "Columbia.")
1747  * SIFT is available for compiling but does not ship with ArrayFire
1748  hosted installers/pre-built libraries
1749  * \ref af::dog() - Difference of Gaussians
1750 
1751 * Image Processing Functions
1752  * \ref ycbcr2rgb() and \ref rgb2ycbcr() - RGB <->YCbCr color space conversion
1753  * \ref wrap() and \ref unwrap() Wrap and Unwrap
1754  * \ref sat() - Summed Area Tables
1755  * \ref loadImageMem() and \ref saveImageMem() - Load and Save images to/from memory
1756  * \ref af_image_format - Added imageFormat (af_image_format) enum
1757 
1758 * Array & Data Handling
1759  * \ref copy() - Copy
1760  * array::lock() and array::unlock() - Lock and Unlock
1761  * \ref select() and \ref replace() - Select and Replace
1762  * Get array reference count (af_get_data_ref_count)
1763 
1764 * Signal Processing
1765  * \ref fftInPlace() - 1D in place FFT
1766  * \ref fft2InPlace() - 2D in place FFT
1767  * \ref fft3InPlace() - 3D in place FFT
1768  * \ref ifftInPlace() - 1D in place Inverse FFT
1769  * \ref ifft2InPlace() - 2D in place Inverse FFT
1770  * \ref ifft3InPlace() - 3D in place Inverse FFT
1771  * \ref fftR2C() - Real to complex FFT
1772  * \ref fftC2R() - Complex to Real FFT
1773 
1774 * Linear Algebra
1775  * \ref svd() and \ref svdInPlace() - Singular Value Decomposition
1776 
1777 * Other operations
1778  * \ref sigmoid() - Sigmoid
1779  * Sum (with option to replace NaN values)
1780  * Product (with option to replace NaN values)
1781 
1782 * Graphics
1783  * Window::setSize() - Window resizing using Forge API
1784 
1785 * Utility
1786  * Allow users to set print precision (print, af_print_array_gen)
1787  * \ref saveArray() and \ref readArray() - Stream arrays to binary files
1788  * \ref toString() - toString function returns the array and data as a string
1789 
1790 * CUDA specific functionality
1791  * \ref getStream() - Returns default CUDA stream ArrayFire uses for the current device
1792  * \ref getNativeId() - Returns native id of the CUDA device
1793 
1794 Improvements
1795 ------------
1796 * dot
1797  * Allow complex inputs with conjugate option
1798 * AF_INTERP_LOWER interpolation
1799  * For resize, rotate and transform based functions
1800 * 64-bit integer support
1801  * For reductions, random, iota, range, diff1, diff2, accum, join, shift
1802  and tile
1803 * convolve
1804  * Support for non-overlapping batched convolutions
1805 * Complex Arrays
1806  * Fix binary ops on complex inputs of mixed types
1807  * Complex type support for exp
1808 * tile
1809  * Performance improvements by using JIT when possible.
1810 * Add AF_API_VERSION macro
1811  * Allows disabling of API to maintain consistency with previous versions
1812 * Other Performance Improvements
1813  * Use reference counting to reduce unnecessary copies
1814 * CPU Backend
1815  * Device properties for CPU
1816  * Improved performance when all buffers are indexed linearly
1817 * CUDA Backend
1818  * Use streams in CUDA (no longer using default stream)
1819  * Using async cudaMem ops
1820  * Add 64-bit integer support for JIT functions
1821  * Performance improvements for CUDA JIT for non-linear 3D and 4D arrays
1822 * OpenCL Backend
1823  * Improve compilation times for OpenCL backend
1824  * Performance improvements for non-linear JIT kernels on OpenCL
1825  * Improved shared memory load/store in many OpenCL kernels (PR 933)
1826  * Using cl.hpp v1.2.7
1827 
1828 Bug Fixes
1829 ---------
1830 * Common
1831  * Fix compatibility of c32/c64 arrays when operating with scalars
1832  * Fix median for all values of an array
1833  * Fix double free issue when indexing (30cbbc7)
1834  * Fix [bug](https://github.com/arrayfire/arrayfire/issues/901) in rank
1835  * Fix default values for scale throwing exception
1836  * Fix conjg raising exception on real input
1837  * Fix bug when using conjugate transpose for vector input
1838  * Fix issue with const input for array_proxy::get()
1839 * CPU Backend
1840  * Fix randn generating same sequence for multiple calls
1841  * Fix setSeed for randu
1842  * Fix casting to and from complex
1843  * Check NULL values when allocating memory
1844  * Fix [offset issue](https://github.com/arrayfire/arrayfire/issues/923) for CPU element-wise operations
1845 
1846 New Examples
1847 ------------
1848 * Match Template
1849 * Susan
1850 * Heston Model (contributed by Michael Nowotny)
1851 
1852 Installer
1853 ----------
1854 * Fixed bug in automatic detection of ArrayFire when using with CMake in Windows
1855 * The Linux libraries are now compiled with static version of FreeImage
1856 
1857 Known Issues
1858 ------------
1859 * OpenBlas can cause issues with QR factorization in CPU backend
1860 * FreeImage older than 3.10 can cause issues with loadImageMem and
1861  saveImageMem
1862 * OpenCL backend issues on OSX
1863  * AMD GPUs not supported because of driver issues
1864  * Intel CPUs not supported
1865  * Linear algebra functions do not work on Intel GPUs.
1866 * Stability and correctness issues with open source OpenCL implementations such as Beignet, GalliumCompute.
1867 
1868 v3.0.2
1869 ==============
1870 
1871 Bug Fixes
1872 --------------
1873 
1874 * Added missing symbols from the compatible API
1875 * Fixed a bug affecting corner rows and elements in \ref af::grad()
1876 * Fixed linear interpolation bugs affecting large images in the following:
1877  - \ref af::approx1()
1878  - \ref af::approx2()
1879  - \ref af::resize()
1880  - \ref af::rotate()
1881  - \ref af::scale()
1882  - \ref af::skew()
1883  - \ref af::transform()
1884 
1885 Documentation
1886 -----------------
1887 
1888 * Added missing documentation for \ref af::constant()
1889 * Added missing documentation for `array::scalar()`
1890 * Added supported input types for functions in `arith.h`
1891 
1892 v3.0.1
1893 ==============
1894 
1895 Bug Fixes
1896 --------------
1897 
1898 * Fixed header to work in Visual Studio 2015
1899 * Fixed a bug in batched mode for FFT based convolutions
1900 * Fixed graphics issues on OSX
1901 * Fixed various bugs in visualization functions
1902 
1903 Other improvements
1904 ---------------
1905 
1906 * Improved fractal example
1907 * New OSX installer
1908 * Improved Windows installer
1909  * Default install path has been changed
1910 * Fixed bug in machine learning examples
1911 
1912 <br>
1913 
1914 v3.0.0
1915 =================
1916 
1917 Major Updates
1918 -------------
1919 
1920 * ArrayFire is now open source
1921 * Major changes to the visualization library
1922 * Introducing handle based C API
1923 * New backend: CPU fallback available for systems without GPUs
1924 * Dense linear algebra functions available for all backends
1925 * Support for 64 bit integers
1926 
1927 Function Additions
1928 ------------------
1929 * Data generation functions
1930  * range()
1931  * iota()
1932 
1933 * Computer Vision Algorithms
1934  * features()
1935  * A data structure to hold features
1936  * fast()
1937  * FAST feature detector
1938  * orb()
1939  * ORB A feature descriptor extractor
1940 
1941 * Image Processing
1942  * convolve1(), convolve2(), convolve3()
1943  * Specialized versions of convolve() to enable better batch support
1944  * fftconvolve1(), fftconvolve2(), fftconvolve3()
1945  * Convolutions in frequency domain to support larger kernel sizes
1946  * dft(), idft()
1947  * Unified functions for calling multi dimensional ffts.
1948  * matchTemplate()
1949  * Match a kernel in an image
1950  * sobel()
1951  * Get sobel gradients of an image
1952  * rgb2hsv(), hsv2rgb(), rgb2gray(), gray2rgb()
1953  * Explicit function calls to colorspace conversions
1954  * erode3d(), dilate3d()
1955  * Explicit erode and dilate calls for image morphing
1956 
1957 * Linear Algebra
1958  * matmulNT(), matmulTN(), matmulTT()
1959  * Specialized versions of matmul() for transposed inputs
1960  * luInPlace(), choleskyInPlace(), qrInPlace()
1961  * In place factorizations to improve memory requirements
1962  * solveLU()
1963  * Specialized solve routines to improve performance
1964  * OpenCL backend now Linear Algebra functions
1965 
1966 * Other functions
1967  * lookup() - lookup indices from a table
1968  * batchFunc() - helper function to perform batch operations
1969 
1970 * Visualization functions
1971  * Support for multiple windows
1972  * window.hist()
1973  * Visualize the output of the histogram
1974 
1975 * C API
1976  * Removed old pointer based C API
1977  * Introducing handle base C API
1978  * Just In Time compilation available in C API
1979  * C API has feature parity with C++ API
1980  * bessel functions removed
1981  * cross product functions removed
1982  * Kronecker product functions removed
1983 
1984 Performance Improvements
1985 ------------------------
1986 * Improvements across the board for OpenCL backend
1987 
1988 API Changes
1989 ---------------------
1990 * `print` is now af_print()
1991 * seq(): The step parameter is now the third input
1992  * seq(start, step, end) changed to seq(start, end, step)
1993 * gfor(): The iterator now needs to be seq()
1994 
1995 Deprecated Function APIs
1996 ------------------------
1997 Deprecated APIs are in af/compatible.h
1998 
1999 * devicecount() changed to getDeviceCount()
2000 * deviceset() changed to setDevice()
2001 * deviceget() changed to getDevice()
2002 * loadimage() changed to loadImage()
2003 * saveimage() changed to saveImage()
2004 * gaussiankernel() changed to gaussianKernel()
2005 * alltrue() changed to allTrue()
2006 * anytrue() changed to anyTrue()
2007 * setunique() changed to setUnique()
2008 * setunion() changed to setUnion()
2009 * setintersect() changed to setIntersect()
2010 * histequal() changed to histEqual()
2011 * colorspace() changed to colorSpace()
2012 * filter() deprecated. Use convolve1() and convolve2()
2013 * mul() changed to product()
2014 * deviceprop() changed to deviceProp()
2015 
2016 Known Issues
2017 ----------------------
2018 * OpenCL backend issues on OSX
2019  * AMD GPUs not supported because of driver issues
2020  * Intel CPUs not supported
2021  * Linear algebra functions do not work on Intel GPUs.
2022 * Stability and correctness issues with open source OpenCL implementations such as Beignet, GalliumCompute.