|
TBCI Numerical high perf. C++ Library 2.8.0
|

Go to the source code of this file.
Macros | |
| #define | USE_PLAIN_VEC_KERNELS |
| file perf_opt.h Default settings for optimum performance on different architectures and compilers. | |
| #define | DEF_UNROLL_DEPTH 4 |
| #define | DEF_PREFETCH_AHEAD 4 |
| #define | DEF_CACHELINE_SZ 32 |
| #define | DEF_CACHE_LOC_READ 2 |
| This is optimized for small objects, for large ones 0,1 or 0,0 may be best. | |
| #define | DEF_CACHE_LOC_WRITE 3 |
| #define | PREFETCH_AHEAD DEF_PREFETCH_AHEAD |
| How many cache lines (!) to prefetch ahead of use, depends on your memory latency. | |
| #define | UNROLL_DEPTH DEF_UNROLL_DEPTH |
| How many iters per loop (unrolling) Trade code bloat against speed. | |
| #define | CACHELINE_SZ DEF_CACHELINE_SZ |
| (L1) Cache line size in bytes. | |
| #define | CACHE_LOC_READ DEF_CACHE_LOC_READ |
| Cache locality for read from and written to pointers 0: don't cache (streaming data, only accessed once). | |
| #define | CACHE_LOC_WRITE DEF_CACHE_LOC_WRITE |
| #define | EL_PER_CL(T) |
| #define | PREF_OFFS(T) |
| #define CACHE_LOC_READ DEF_CACHE_LOC_READ |
Cache locality for read from and written to pointers 0: don't cache (streaming data, only accessed once).
3: cache in all caches (likely to be reaccessed soon) 1,2 are intermediate values. See gcc docu on __builtin_prefetch Advice: Use lower values for reading than writing ... (you'll more likely need the result again, not the args) For large objects (larger than L2/L3 cache), CACHE_LOC_READ=0 is best; CACHE_LOC_WRITE is less important, but 0 also seems best.
Definition at line 165 of file perf_opt.h.
| #define CACHE_LOC_WRITE DEF_CACHE_LOC_WRITE |
Definition at line 168 of file perf_opt.h.
| #define CACHELINE_SZ DEF_CACHELINE_SZ |
(L1) Cache line size in bytes.
32 or 64 bytes on many archs Used to only issue a prefetch once per cacheline and to scale the offset when prefetching ahead.
Definition at line 152 of file perf_opt.h.
| #define DEF_CACHE_LOC_READ 2 |
This is optimized for small objects, for large ones 0,1 or 0,0 may be best.
Definition at line 132 of file perf_opt.h.
| #define DEF_CACHE_LOC_WRITE 3 |
Definition at line 133 of file perf_opt.h.
| #define DEF_CACHELINE_SZ 32 |
Definition at line 126 of file perf_opt.h.
| #define DEF_PREFETCH_AHEAD 4 |
Definition at line 120 of file perf_opt.h.
| #define DEF_UNROLL_DEPTH 4 |
Definition at line 117 of file perf_opt.h.
| #define EL_PER_CL | ( | T | ) |
| #define PREF_OFFS | ( | T | ) |
Definition at line 173 of file perf_opt.h.
| #define PREFETCH_AHEAD DEF_PREFETCH_AHEAD |
How many cache lines (!) to prefetch ahead of use, depends on your memory latency.
4 or 8 seem to be good choices
Definition at line 140 of file perf_opt.h.
| #define UNROLL_DEPTH DEF_UNROLL_DEPTH |
How many iters per loop (unrolling) Trade code bloat against speed.
4 or 8 are good values However 1 might be best if your compiler is better at unrolling than us
Definition at line 146 of file perf_opt.h.
| #define USE_PLAIN_VEC_KERNELS |
file perf_opt.h Default settings for optimum performance on different architectures and compilers.
(c) Kurt Garloff kurt@.nosp@m.garl.nosp@m.off.d.nosp@m.e, 2002-07-30
Definition at line 114 of file perf_opt.h.