TBCI Numerical high perf. C++ Library  2.8.0
Macros
perf_opt.h File Reference
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Macros

#define USE_PLAIN_VEC_KERNELS
 file perf_opt.h Default settings for optimum performance on different architectures and compilers. More...
 
#define DEF_UNROLL_DEPTH   4
 
#define DEF_PREFETCH_AHEAD   4
 
#define DEF_CACHELINE_SZ   32
 
#define DEF_CACHE_LOC_READ   2
 This is optimized for small objects, for large ones 0,1 or 0,0 may be best. More...
 
#define DEF_CACHE_LOC_WRITE   3
 
#define PREFETCH_AHEAD   DEF_PREFETCH_AHEAD
 How many cache lines (!) to prefetch ahead of use, depends on your memory latency. More...
 
#define UNROLL_DEPTH   DEF_UNROLL_DEPTH
 How many iters per loop (unrolling) Trade code bloat against speed. More...
 
#define CACHELINE_SZ   DEF_CACHELINE_SZ
 (L1) Cache line size in bytes. More...
 
#define CACHE_LOC_READ   DEF_CACHE_LOC_READ
 Cache locality for read from and written to pointers 0: don't cache (streaming data, only accessed once). More...
 
#define CACHE_LOC_WRITE   DEF_CACHE_LOC_WRITE
 
#define EL_PER_CL(T)   (signed)((CACHELINE_SZ/sizeof( T ))?(CACHELINE_SZ/sizeof( T )):1)
 
#define PREF_OFFS(T)   (EL_PER_CL(T)*PREFETCH_AHEAD)
 

Macro Definition Documentation

#define CACHE_LOC_READ   DEF_CACHE_LOC_READ

Cache locality for read from and written to pointers 0: don't cache (streaming data, only accessed once).

3: cache in all caches (likely to be reaccessed soon) 1,2 are intermediate values. See gcc docu on __builtin_prefetch Advice: Use lower values for reading than writing ... (you'll more likely need the result again, not the args) For large objects (larger than L2/L3 cache), CACHE_LOC_READ=0 is best; CACHE_LOC_WRITE is less important, but 0 also seems best.

Definition at line 165 of file perf_opt.h.

#define CACHE_LOC_WRITE   DEF_CACHE_LOC_WRITE

Definition at line 168 of file perf_opt.h.

#define CACHELINE_SZ   DEF_CACHELINE_SZ

(L1) Cache line size in bytes.

32 or 64 bytes on many archs Used to only issue a prefetch once per cacheline and to scale the offset when prefetching ahead.

Definition at line 152 of file perf_opt.h.

#define DEF_CACHE_LOC_READ   2

This is optimized for small objects, for large ones 0,1 or 0,0 may be best.

Definition at line 132 of file perf_opt.h.

#define DEF_CACHE_LOC_WRITE   3

Definition at line 133 of file perf_opt.h.

#define DEF_CACHELINE_SZ   32

Definition at line 126 of file perf_opt.h.

#define DEF_PREFETCH_AHEAD   4

Definition at line 120 of file perf_opt.h.

#define DEF_UNROLL_DEPTH   4

Definition at line 117 of file perf_opt.h.

#define EL_PER_CL (   T)    (signed)((CACHELINE_SZ/sizeof( T ))?(CACHELINE_SZ/sizeof( T )):1)

Definition at line 172 of file perf_opt.h.

Referenced by lu_decomp().

#define PREF_OFFS (   T)    (EL_PER_CL(T)*PREFETCH_AHEAD)

Definition at line 173 of file perf_opt.h.

#define PREFETCH_AHEAD   DEF_PREFETCH_AHEAD

How many cache lines (!) to prefetch ahead of use, depends on your memory latency.

4 or 8 seem to be good choices

Definition at line 140 of file perf_opt.h.

#define UNROLL_DEPTH   DEF_UNROLL_DEPTH

How many iters per loop (unrolling) Trade code bloat against speed.

4 or 8 are good values However 1 might be best if your compiler is better at unrolling than us

Definition at line 146 of file perf_opt.h.

#define USE_PLAIN_VEC_KERNELS

file perf_opt.h Default settings for optimum performance on different architectures and compilers.

(c) Kurt Garloff kurt@.nosp@m.garl.nosp@m.off.d.nosp@m.e, 2002-07-30

Id:
perf_opt.h,v 1.1.2.17 2019/06/17 09:51:34 garloff Exp

Definition at line 114 of file perf_opt.h.