|
TBCI Numerical high perf. C++ Library
2.8.0
|

Go to the source code of this file.
Macros | |
| #define | USE_PLAIN_VEC_KERNELS |
| file perf_opt.h Default settings for optimum performance on different architectures and compilers. More... | |
| #define | DEF_UNROLL_DEPTH 4 |
| #define | DEF_PREFETCH_AHEAD 4 |
| #define | DEF_CACHELINE_SZ 32 |
| #define | DEF_CACHE_LOC_READ 2 |
| This is optimized for small objects, for large ones 0,1 or 0,0 may be best. More... | |
| #define | DEF_CACHE_LOC_WRITE 3 |
| #define | PREFETCH_AHEAD DEF_PREFETCH_AHEAD |
| How many cache lines (!) to prefetch ahead of use, depends on your memory latency. More... | |
| #define | UNROLL_DEPTH DEF_UNROLL_DEPTH |
| How many iters per loop (unrolling) Trade code bloat against speed. More... | |
| #define | CACHELINE_SZ DEF_CACHELINE_SZ |
| (L1) Cache line size in bytes. More... | |
| #define | CACHE_LOC_READ DEF_CACHE_LOC_READ |
| Cache locality for read from and written to pointers 0: don't cache (streaming data, only accessed once). More... | |
| #define | CACHE_LOC_WRITE DEF_CACHE_LOC_WRITE |
| #define | EL_PER_CL(T) (signed)((CACHELINE_SZ/sizeof( T ))?(CACHELINE_SZ/sizeof( T )):1) |
| #define | PREF_OFFS(T) (EL_PER_CL(T)*PREFETCH_AHEAD) |
| #define CACHE_LOC_READ DEF_CACHE_LOC_READ |
Cache locality for read from and written to pointers 0: don't cache (streaming data, only accessed once).
3: cache in all caches (likely to be reaccessed soon) 1,2 are intermediate values. See gcc docu on __builtin_prefetch Advice: Use lower values for reading than writing ... (you'll more likely need the result again, not the args) For large objects (larger than L2/L3 cache), CACHE_LOC_READ=0 is best; CACHE_LOC_WRITE is less important, but 0 also seems best.
Definition at line 165 of file perf_opt.h.
| #define CACHE_LOC_WRITE DEF_CACHE_LOC_WRITE |
Definition at line 168 of file perf_opt.h.
| #define CACHELINE_SZ DEF_CACHELINE_SZ |
(L1) Cache line size in bytes.
32 or 64 bytes on many archs Used to only issue a prefetch once per cacheline and to scale the offset when prefetching ahead.
Definition at line 152 of file perf_opt.h.
| #define DEF_CACHE_LOC_READ 2 |
This is optimized for small objects, for large ones 0,1 or 0,0 may be best.
Definition at line 132 of file perf_opt.h.
| #define DEF_CACHE_LOC_WRITE 3 |
Definition at line 133 of file perf_opt.h.
| #define DEF_CACHELINE_SZ 32 |
Definition at line 126 of file perf_opt.h.
| #define DEF_PREFETCH_AHEAD 4 |
Definition at line 120 of file perf_opt.h.
| #define DEF_UNROLL_DEPTH 4 |
Definition at line 117 of file perf_opt.h.
| #define EL_PER_CL | ( | T | ) | (signed)((CACHELINE_SZ/sizeof( T ))?(CACHELINE_SZ/sizeof( T )):1) |
Definition at line 172 of file perf_opt.h.
Referenced by lu_decomp().
| #define PREF_OFFS | ( | T | ) | (EL_PER_CL(T)*PREFETCH_AHEAD) |
Definition at line 173 of file perf_opt.h.
| #define PREFETCH_AHEAD DEF_PREFETCH_AHEAD |
How many cache lines (!) to prefetch ahead of use, depends on your memory latency.
4 or 8 seem to be good choices
Definition at line 140 of file perf_opt.h.
| #define UNROLL_DEPTH DEF_UNROLL_DEPTH |
How many iters per loop (unrolling) Trade code bloat against speed.
4 or 8 are good values However 1 might be best if your compiler is better at unrolling than us
Definition at line 146 of file perf_opt.h.
| #define USE_PLAIN_VEC_KERNELS |
file perf_opt.h Default settings for optimum performance on different architectures and compilers.
(c) Kurt Garloff kurt@garloff.de, 2002-07-30
Definition at line 114 of file perf_opt.h.
1.8.5