TBCI Numerical high perf. C++ Library 2.8.0
perf_opt.h File Reference
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Macros

#define USE_PLAIN_VEC_KERNELS
 file perf_opt.h Default settings for optimum performance on different architectures and compilers.
#define DEF_UNROLL_DEPTH   4
#define DEF_PREFETCH_AHEAD   4
#define DEF_CACHELINE_SZ   32
#define DEF_CACHE_LOC_READ   2
 This is optimized for small objects, for large ones 0,1 or 0,0 may be best.
#define DEF_CACHE_LOC_WRITE   3
#define PREFETCH_AHEAD   DEF_PREFETCH_AHEAD
 How many cache lines (!) to prefetch ahead of use, depends on your memory latency.
#define UNROLL_DEPTH   DEF_UNROLL_DEPTH
 How many iters per loop (unrolling) Trade code bloat against speed.
#define CACHELINE_SZ   DEF_CACHELINE_SZ
 (L1) Cache line size in bytes.
#define CACHE_LOC_READ   DEF_CACHE_LOC_READ
 Cache locality for read from and written to pointers 0: don't cache (streaming data, only accessed once).
#define CACHE_LOC_WRITE   DEF_CACHE_LOC_WRITE
#define EL_PER_CL(T)
#define PREF_OFFS(T)

Macro Definition Documentation

◆ CACHE_LOC_READ

#define CACHE_LOC_READ   DEF_CACHE_LOC_READ

Cache locality for read from and written to pointers 0: don't cache (streaming data, only accessed once).

3: cache in all caches (likely to be reaccessed soon) 1,2 are intermediate values. See gcc docu on __builtin_prefetch Advice: Use lower values for reading than writing ... (you'll more likely need the result again, not the args) For large objects (larger than L2/L3 cache), CACHE_LOC_READ=0 is best; CACHE_LOC_WRITE is less important, but 0 also seems best.

Definition at line 165 of file perf_opt.h.

◆ CACHE_LOC_WRITE

#define CACHE_LOC_WRITE   DEF_CACHE_LOC_WRITE

Definition at line 168 of file perf_opt.h.

◆ CACHELINE_SZ

#define CACHELINE_SZ   DEF_CACHELINE_SZ

(L1) Cache line size in bytes.

32 or 64 bytes on many archs Used to only issue a prefetch once per cacheline and to scale the offset when prefetching ahead.

Definition at line 152 of file perf_opt.h.

◆ DEF_CACHE_LOC_READ

#define DEF_CACHE_LOC_READ   2

This is optimized for small objects, for large ones 0,1 or 0,0 may be best.

Definition at line 132 of file perf_opt.h.

◆ DEF_CACHE_LOC_WRITE

#define DEF_CACHE_LOC_WRITE   3

Definition at line 133 of file perf_opt.h.

◆ DEF_CACHELINE_SZ

#define DEF_CACHELINE_SZ   32

Definition at line 126 of file perf_opt.h.

◆ DEF_PREFETCH_AHEAD

#define DEF_PREFETCH_AHEAD   4

Definition at line 120 of file perf_opt.h.

◆ DEF_UNROLL_DEPTH

#define DEF_UNROLL_DEPTH   4

Definition at line 117 of file perf_opt.h.

◆ EL_PER_CL

#define EL_PER_CL ( T)
Value:
(signed)((CACHELINE_SZ/sizeof( T ))?(CACHELINE_SZ/sizeof( T )):1)
#define T
Definition bdmatlib.cc:20
#define CACHELINE_SZ
(L1) Cache line size in bytes.
Definition perf_opt.h:152

Definition at line 172 of file perf_opt.h.

Referenced by lu_decomp().

◆ PREF_OFFS

#define PREF_OFFS ( T)
Value:
#define PREFETCH_AHEAD
How many cache lines (!) to prefetch ahead of use, depends on your memory latency.
Definition perf_opt.h:140
#define EL_PER_CL(T)
Definition perf_opt.h:172

Definition at line 173 of file perf_opt.h.

◆ PREFETCH_AHEAD

#define PREFETCH_AHEAD   DEF_PREFETCH_AHEAD

How many cache lines (!) to prefetch ahead of use, depends on your memory latency.

4 or 8 seem to be good choices

Definition at line 140 of file perf_opt.h.

◆ UNROLL_DEPTH

#define UNROLL_DEPTH   DEF_UNROLL_DEPTH

How many iters per loop (unrolling) Trade code bloat against speed.

4 or 8 are good values However 1 might be best if your compiler is better at unrolling than us

Definition at line 146 of file perf_opt.h.

◆ USE_PLAIN_VEC_KERNELS

#define USE_PLAIN_VEC_KERNELS

file perf_opt.h Default settings for optimum performance on different architectures and compilers.

(c) Kurt Garloff kurt@.nosp@m.garl.nosp@m.off.d.nosp@m.e, 2002-07-30

Id
perf_opt.h,v 1.1.2.17 2019/06/17 09:51:34 garloff Exp

Definition at line 114 of file perf_opt.h.