|
TBCI Numerical high perf. C++ Library
2.8.0
|
macros for composing unrolled prefetching loops over arrays. More...
Go to the source code of this file.
Macros | |
| #define | LCTYPE(T) REGISTER typename tbci_traits<T>::loop_const_refval_type |
| Shortcut for loop const ref type. More... | |
| #define | LCTYPED(T) REGISTER tbci_traits<T>::loop_const_refval_type |
| #define | UNROLL_DEPTH 4 |
| When unrolling the loops, I had the following architectural details in mind: More... | |
| #define | UNROLL1_PREF_KERNEL5(OPER, T, CA0, CA1, CA2) |
| Non-unrolled kernel for 5 args with prefetching. More... | |
| #define | UNROLL1_KERNEL5(OPER) |
| Non-unrolled kernel for 5 args without prefetching. More... | |
| #define | UNROLL1_KERNEL5_PREPARE do {} while(0) |
| #define | UNROLL1_KERNEL5_FIXUP do {} while(0) |
| #define | UNROLL2_PREF_KERNEL5(OPER, T, CA0, CA1, CA2) |
| Twice unrolled kernel for 5 args with prefetching. More... | |
| #define | UNROLL2_KERNEL5(OPER) |
| Twice unrolled kernel for 5 args without prefetching. More... | |
| #define | UNROLL2_KERNEL5_PREPARE do {} while(0) |
| #define | UNROLL2_KERNEL5_FIXUP do {} while(0) |
| #define | UNROLL4_PREF_KERNEL5(OPER, T, CA0, CA1, CA2) |
| Four times unrolled kernel for 5 args with prefetching. More... | |
| #define | UNROLL4_KERNEL5(OPER) |
| Four times unrolled kernel for 5 args without prefetching. More... | |
| #define | UNROLL4_KERNEL5_PREPARE do {} while(0) |
| #define | UNROLL4_KERNEL5_FIXUP do {} while(0) |
| #define | UNROLL8_PREF_KERNEL5(OPER, T, CA0, CA1, CA2) |
| Eight times unrolled kernel for 5 args with prefetching. More... | |
| #define | UNROLL8_KERNEL5(OPER) |
| Four times unrolled kernel for 5 args without prefetching. More... | |
| #define | UNROLL8_KERNEL5_PREPARE do {} while(0) |
| #define | UNROLL8_KERNEL5_FIXUP do {} while(0) |
| #define | PREF_AHEAD3(T, CA0, CA1, CA2) |
| Initial prefetch ahead (3 pointers) More... | |
| #define | UNROLL1_PREF_KERNEL4(OPER, T, PREFETCH_X, CA0, CA1) |
| Non-unrolled kernel for 4 args with prefetching. More... | |
| #define | UNROLL1_KERNEL4(OPER) |
| Non-unrolled kernel for 4 args without prefetching. More... | |
| #define | UNROLL1_KERNEL4_PREPARE do {} while(0) |
| #define | UNROLL1_KERNEL4_FIXUP do {} while(0) |
| #define | UNROLL2_PREF_KERNEL4(OPER, T, PREFETCH_X, CA0, CA1) |
| Twice unrolled kernel for 4 args with prefetching. More... | |
| #define | UNROLL2_KERNEL4(OPER) |
| Twice unrolled kernel for 4 args without prefetching. More... | |
| #define | UNROLL2_KERNEL4_PREPARE do {} while(0) |
| #define | UNROLL2_KERNEL4_FIXUP do {} while(0) |
| #define | UNROLL4_PREF_KERNEL4(OPER, T, PREFETCH_X, CA0, CA1) |
| Four times unrolled kernel for 4 args with prefetching. More... | |
| #define | UNROLL4_KERNEL4(OPER) |
| Four times unrolled kernel for 4 args without prefetching. More... | |
| #define | UNROLL4_KERNEL4_PREPARE do {} while(0) |
| #define | UNROLL4_KERNEL4_FIXUP do {} while(0) |
| #define | UNROLL8_PREF_KERNEL4(OPER, T, PREFETCH_X, CA0, CA1) |
| Eight times unrolled kernel for 4 args with prefetching. More... | |
| #define | UNROLL8_KERNEL4(OPER) |
| Four times unrolled kernel for 4 args without prefetching. More... | |
| #define | UNROLL8_KERNEL4_PREPARE do {} while(0) |
| #define | UNROLL8_KERNEL4_FIXUP do {} while(0) |
| #define | PREF_AHEAD2(T, PREFETCH_X, CA0, CA1) |
| Initial prefetch ahead (2 pointers) More... | |
| #define | UNROLL1_PREF_KERNEL3(OPER, T, PREFETCH_X, CA0) |
| Non-unrolled kernel for 3 args with prefetching. More... | |
| #define | UNROLL1_KERNEL3(OPER) |
| Non-unrolled kernel for 3 args without prefetching. More... | |
| #define | UNROLL1_KERNEL3_PREPARE do {} while(0) |
| #define | UNROLL1_KERNEL3_FIXUP do {} while(0) |
| #define | UNROLL2_PREF_KERNEL3(OPER, T, PREFETCH_X, CA0) |
| Twice unrolled kernel for 3 args with prefetching. More... | |
| #define | UNROLL2_KERNEL3(OPER) |
| Twice unrolled kernel for 3 args without prefetching. More... | |
| #define | UNROLL2_KERNEL3_PREPARE do {} while(0) |
| #define | UNROLL2_KERNEL3_FIXUP do {} while(0) |
| #define | UNROLL4_PREF_KERNEL3(OPER, T, PREFETCH_X, CA0) |
| Four times unrolled kernel for 3 args with prefetching. More... | |
| #define | UNROLL4_KERNEL3(OPER) |
| Four times unrolled kernel for 3 args without prefetching. More... | |
| #define | UNROLL4_KERNEL3_PREPARE do {} while(0) |
| #define | UNROLL4_KERNEL3_FIXUP do {} while(0) |
| #define | UNROLL8_PREF_KERNEL3(OPER, T, PREFETCH_X, CA0) |
| Eight times unrolled kernel for 3 args with prefetching. More... | |
| #define | UNROLL8_KERNEL3(OPER) |
| Four times unrolled kernel for 3 args without prefetching. More... | |
| #define | UNROLL8_KERNEL3_PREPARE do {} while(0) |
| #define | UNROLL8_KERNEL3_FIXUP do {} while(0) |
| #define | PREF_AHEAD1(T, PREFETCH_X, CA0) |
| Initial prefetch ahead (1 pointer) More... | |
| #define | UNR_PREF_KERNEL5 UNROLL4_PREF_KERNEL5 |
| #define | UNR_KERNEL5 UNROLL4_KERNEL5 |
| #define | UNR_KERNEL5_PREP UNROLL4_KERNEL5_PREPARE |
| #define | UNR_KERNEL5_FIX UNROLL4_KERNEL5_FIXUP |
| #define | UNR_PREF_KERNEL4 UNROLL4_PREF_KERNEL4 |
| #define | UNR_KERNEL4 UNROLL4_KERNEL4 |
| #define | UNR_KERNEL4_PREP UNROLL4_KERNEL4_PREPARE |
| #define | UNR_KERNEL4_FIX UNROLL4_KERNEL4_FIXUP |
| #define | UNR_PREF_KERNEL3 UNROLL4_PREF_KERNEL3 |
| #define | UNR_KERNEL3 UNROLL4_KERNEL3 |
| #define | UNR_KERNEL3_PREP UNROLL4_KERNEL3_PREPARE |
| #define | UNR_KERNEL3_FIX UNROLL4_KERNEL3_FIXUP |
| #define | VKERN_TEMPL_3V_PREF(OP, T) do {} while (0) |
| Fragments to be combined for different cases 1,2,3 vector fields 0,1,2 scalars to multiply with variable number of data elements per cacheline 1,2,4,8,16 cachelines ahead prefetch 1,2,4,8 fold unrolling. More... | |
| #define | VKERN_TEMPL_2V_PREF(OP, T, PREFETCH_X, CW) do {} while (0) |
| #define | VKERN_TEMPL_1V_PREF(OP, T, PREFETCH_X, CW) do {} while (0) |
| #define | VKERN_TEMPL_3V(FNAME, OP3) |
| gcc-2.95.x seems to fail caching a const double& in a REGISTER. More... | |
| #define | VKERN_TEMPL_3V_C(FNAME, OP3) |
| Operations of type vec = vec OP val * vec. More... | |
| #define | VKERN_TEMPL_3V_CC(FNAME, OP3) |
| Operations of type vec = val * vec OP val * vec. More... | |
| #define | VKERN_TEMPL_2V(FNAME, OP2) |
| Operations of type vec OP= vec. More... | |
| #define | VKERN_TEMPL_2V_C(FNAME, OP2) |
| Operations of type VEC = VEC OP VAL or VAL OP VEC. More... | |
| #define | VKERN_TEMPL_2V_CC(FNAME, OP2) |
| Operations of type VEC = VEC OP VAL or VAL OP VEC. More... | |
| #define | VKERN_TEMPL_2V_T(FNAME, OP2, TYPE) |
| Operations of type TYPE = VEC OP VEC. More... | |
| #define | VKERN_TEMPL_1V(FNAME, OP1) |
| Operations of type VEC = OP self. More... | |
| #define | VKERN_TEMPL_1V_C(FNAME, OP1) |
| Operations of type VEC OP= VAL. More... | |
| #define | VKERN_TEMPL_1V_CC(FNAME, OP1) |
| Operations of type VEC *= S OP= VAL. More... | |
| #define | VKERN_TEMPL_1V_T(FNAME, OP1, TYPE) |
| Operations of type TYPE = OP VEC. More... | |
| #define | VKERN_TEMPL_1V_T_LD(FNAME, OP1, TYPE) |
| Operations of type TYPE = OP VEC (using LONG_DOUBLE internally) More... | |
macros for composing unrolled prefetching loops over arrays.
(c) Kurt Garloff, kurt@garloff.de, 7/2002, GNU LGPL v2
Definition in file unroll_prefetch_def2.h.
Shortcut for loop const ref type.
Definition at line 14 of file unroll_prefetch_def2.h.
Definition at line 15 of file unroll_prefetch_def2.h.
| #define PREF_AHEAD1 | ( | T, | |
| PREFETCH_X, | |||
| CA0 | |||
| ) |
Initial prefetch ahead (1 pointer)
Definition at line 824 of file unroll_prefetch_def2.h.
| #define PREF_AHEAD2 | ( | T, | |
| PREFETCH_X, | |||
| CA0, | |||
| CA1 | |||
| ) |
Initial prefetch ahead (2 pointers)
Definition at line 587 of file unroll_prefetch_def2.h.
| #define PREF_AHEAD3 | ( | T, | |
| CA0, | |||
| CA1, | |||
| CA2 | |||
| ) |
Initial prefetch ahead (3 pointers)
Definition at line 288 of file unroll_prefetch_def2.h.
| #define UNR_KERNEL3 UNROLL4_KERNEL3 |
Definition at line 907 of file unroll_prefetch_def2.h.
| #define UNR_KERNEL3_FIX UNROLL4_KERNEL3_FIXUP |
Definition at line 909 of file unroll_prefetch_def2.h.
| #define UNR_KERNEL3_PREP UNROLL4_KERNEL3_PREPARE |
Definition at line 908 of file unroll_prefetch_def2.h.
| #define UNR_KERNEL4 UNROLL4_KERNEL4 |
Definition at line 902 of file unroll_prefetch_def2.h.
| #define UNR_KERNEL4_FIX UNROLL4_KERNEL4_FIXUP |
Definition at line 904 of file unroll_prefetch_def2.h.
| #define UNR_KERNEL4_PREP UNROLL4_KERNEL4_PREPARE |
Definition at line 903 of file unroll_prefetch_def2.h.
| #define UNR_KERNEL5 UNROLL4_KERNEL5 |
Definition at line 897 of file unroll_prefetch_def2.h.
| #define UNR_KERNEL5_FIX UNROLL4_KERNEL5_FIXUP |
Definition at line 899 of file unroll_prefetch_def2.h.
| #define UNR_KERNEL5_PREP UNROLL4_KERNEL5_PREPARE |
Definition at line 898 of file unroll_prefetch_def2.h.
| #define UNR_PREF_KERNEL3 UNROLL4_PREF_KERNEL3 |
Definition at line 906 of file unroll_prefetch_def2.h.
| #define UNR_PREF_KERNEL4 UNROLL4_PREF_KERNEL4 |
Definition at line 901 of file unroll_prefetch_def2.h.
| #define UNR_PREF_KERNEL5 UNROLL4_PREF_KERNEL5 |
Definition at line 896 of file unroll_prefetch_def2.h.
| #define UNROLL1_KERNEL3 | ( | OPER | ) |
Non-unrolled kernel for 3 args without prefetching.
Definition at line 659 of file unroll_prefetch_def2.h.
| #define UNROLL1_KERNEL3_FIXUP do {} while(0) |
Definition at line 665 of file unroll_prefetch_def2.h.
| #define UNROLL1_KERNEL3_PREPARE do {} while(0) |
Definition at line 664 of file unroll_prefetch_def2.h.
| #define UNROLL1_KERNEL4 | ( | OPER | ) |
Non-unrolled kernel for 4 args without prefetching.
Definition at line 388 of file unroll_prefetch_def2.h.
| #define UNROLL1_KERNEL4_FIXUP do {} while(0) |
Definition at line 394 of file unroll_prefetch_def2.h.
| #define UNROLL1_KERNEL4_PREPARE do {} while(0) |
Definition at line 393 of file unroll_prefetch_def2.h.
| #define UNROLL1_KERNEL5 | ( | OPER | ) |
Non-unrolled kernel for 5 args without prefetching.
Definition at line 63 of file unroll_prefetch_def2.h.
| #define UNROLL1_KERNEL5_FIXUP do {} while(0) |
Definition at line 69 of file unroll_prefetch_def2.h.
| #define UNROLL1_KERNEL5_PREPARE do {} while(0) |
Definition at line 68 of file unroll_prefetch_def2.h.
| #define UNROLL1_PREF_KERNEL3 | ( | OPER, | |
| T, | |||
| PREFETCH_X, | |||
| CA0 | |||
| ) |
Non-unrolled kernel for 3 args with prefetching.
Definition at line 652 of file unroll_prefetch_def2.h.
| #define UNROLL1_PREF_KERNEL4 | ( | OPER, | |
| T, | |||
| PREFETCH_X, | |||
| CA0, | |||
| CA1 | |||
| ) |
Non-unrolled kernel for 4 args with prefetching.
Definition at line 379 of file unroll_prefetch_def2.h.
| #define UNROLL1_PREF_KERNEL5 | ( | OPER, | |
| T, | |||
| CA0, | |||
| CA1, | |||
| CA2 | |||
| ) |
Non-unrolled kernel for 5 args with prefetching.
Definition at line 52 of file unroll_prefetch_def2.h.
| #define UNROLL2_KERNEL3 | ( | OPER | ) |
Twice unrolled kernel for 3 args without prefetching.
Definition at line 687 of file unroll_prefetch_def2.h.
| #define UNROLL2_KERNEL3_FIXUP do {} while(0) |
Definition at line 694 of file unroll_prefetch_def2.h.
| #define UNROLL2_KERNEL3_PREPARE do {} while(0) |
Definition at line 693 of file unroll_prefetch_def2.h.
| #define UNROLL2_KERNEL4 | ( | OPER | ) |
Twice unrolled kernel for 4 args without prefetching.
Definition at line 421 of file unroll_prefetch_def2.h.
| #define UNROLL2_KERNEL4_FIXUP do {} while(0) |
Definition at line 428 of file unroll_prefetch_def2.h.
| #define UNROLL2_KERNEL4_PREPARE do {} while(0) |
Definition at line 427 of file unroll_prefetch_def2.h.
| #define UNROLL2_KERNEL5 | ( | OPER | ) |
Twice unrolled kernel for 5 args without prefetching.
Definition at line 99 of file unroll_prefetch_def2.h.
| #define UNROLL2_KERNEL5_FIXUP do {} while(0) |
Definition at line 106 of file unroll_prefetch_def2.h.
| #define UNROLL2_KERNEL5_PREPARE do {} while(0) |
Definition at line 105 of file unroll_prefetch_def2.h.
| #define UNROLL2_PREF_KERNEL3 | ( | OPER, | |
| T, | |||
| PREFETCH_X, | |||
| CA0 | |||
| ) |
Twice unrolled kernel for 3 args with prefetching.
Definition at line 669 of file unroll_prefetch_def2.h.
| #define UNROLL2_PREF_KERNEL4 | ( | OPER, | |
| T, | |||
| PREFETCH_X, | |||
| CA0, | |||
| CA1 | |||
| ) |
Twice unrolled kernel for 4 args with prefetching.
Definition at line 398 of file unroll_prefetch_def2.h.
| #define UNROLL2_PREF_KERNEL5 | ( | OPER, | |
| T, | |||
| CA0, | |||
| CA1, | |||
| CA2 | |||
| ) |
Twice unrolled kernel for 5 args with prefetching.
Definition at line 73 of file unroll_prefetch_def2.h.
| #define UNROLL4_KERNEL3 | ( | OPER | ) |
Four times unrolled kernel for 3 args without prefetching.
Definition at line 730 of file unroll_prefetch_def2.h.
| #define UNROLL4_KERNEL3_FIXUP do {} while(0) |
Definition at line 739 of file unroll_prefetch_def2.h.
| #define UNROLL4_KERNEL3_PREPARE do {} while(0) |
Definition at line 738 of file unroll_prefetch_def2.h.
| #define UNROLL4_KERNEL4 | ( | OPER | ) |
Four times unrolled kernel for 4 args without prefetching.
Definition at line 474 of file unroll_prefetch_def2.h.
| #define UNROLL4_KERNEL4_FIXUP do {} while(0) |
Definition at line 483 of file unroll_prefetch_def2.h.
| #define UNROLL4_KERNEL4_PREPARE do {} while(0) |
Definition at line 482 of file unroll_prefetch_def2.h.
| #define UNROLL4_KERNEL5 | ( | OPER | ) |
Four times unrolled kernel for 5 args without prefetching.
Definition at line 159 of file unroll_prefetch_def2.h.
| #define UNROLL4_KERNEL5_FIXUP do {} while(0) |
Definition at line 169 of file unroll_prefetch_def2.h.
| #define UNROLL4_KERNEL5_PREPARE do {} while(0) |
Definition at line 168 of file unroll_prefetch_def2.h.
| #define UNROLL4_PREF_KERNEL3 | ( | OPER, | |
| T, | |||
| PREFETCH_X, | |||
| CA0 | |||
| ) |
Four times unrolled kernel for 3 args with prefetching.
Definition at line 698 of file unroll_prefetch_def2.h.
| #define UNROLL4_PREF_KERNEL4 | ( | OPER, | |
| T, | |||
| PREFETCH_X, | |||
| CA0, | |||
| CA1 | |||
| ) |
Four times unrolled kernel for 4 args with prefetching.
Definition at line 432 of file unroll_prefetch_def2.h.
| #define UNROLL4_PREF_KERNEL5 | ( | OPER, | |
| T, | |||
| CA0, | |||
| CA1, | |||
| CA2 | |||
| ) |
Four times unrolled kernel for 5 args with prefetching.
Definition at line 110 of file unroll_prefetch_def2.h.
| #define UNROLL8_KERNEL3 | ( | OPER | ) |
Four times unrolled kernel for 3 args without prefetching.
Definition at line 807 of file unroll_prefetch_def2.h.
| #define UNROLL8_KERNEL3_FIXUP do {} while(0) |
Definition at line 820 of file unroll_prefetch_def2.h.
| #define UNROLL8_KERNEL3_PREPARE do {} while(0) |
Definition at line 819 of file unroll_prefetch_def2.h.
| #define UNROLL8_KERNEL4 | ( | OPER | ) |
Four times unrolled kernel for 4 args without prefetching.
Definition at line 570 of file unroll_prefetch_def2.h.
| #define UNROLL8_KERNEL4_FIXUP do {} while(0) |
Definition at line 583 of file unroll_prefetch_def2.h.
| #define UNROLL8_KERNEL4_PREPARE do {} while(0) |
Definition at line 582 of file unroll_prefetch_def2.h.
| #define UNROLL8_KERNEL5 | ( | OPER | ) |
Four times unrolled kernel for 5 args without prefetching.
Definition at line 271 of file unroll_prefetch_def2.h.
| #define UNROLL8_KERNEL5_FIXUP do {} while(0) |
Definition at line 285 of file unroll_prefetch_def2.h.
| #define UNROLL8_KERNEL5_PREPARE do {} while(0) |
Definition at line 284 of file unroll_prefetch_def2.h.
| #define UNROLL8_PREF_KERNEL3 | ( | OPER, | |
| T, | |||
| PREFETCH_X, | |||
| CA0 | |||
| ) |
Eight times unrolled kernel for 3 args with prefetching.
Definition at line 743 of file unroll_prefetch_def2.h.
| #define UNROLL8_PREF_KERNEL4 | ( | OPER, | |
| T, | |||
| PREFETCH_X, | |||
| CA0, | |||
| CA1 | |||
| ) |
Eight times unrolled kernel for 4 args with prefetching.
Definition at line 487 of file unroll_prefetch_def2.h.
| #define UNROLL8_PREF_KERNEL5 | ( | OPER, | |
| T, | |||
| CA0, | |||
| CA1, | |||
| CA2 | |||
| ) |
Eight times unrolled kernel for 5 args with prefetching.
Definition at line 173 of file unroll_prefetch_def2.h.
| #define UNROLL_DEPTH 4 |
When unrolling the loops, I had the following architectural details in mind:
Funny enough, with this little knowledge, we do better than any compiler I found. Compaq cxx on alpha comes close, though. KG.
Definition at line 43 of file unroll_prefetch_def2.h.
| #define VKERN_TEMPL_1V | ( | FNAME, | |
| OP1 | |||
| ) |
Operations of type VEC = OP self.
Definition at line 1235 of file unroll_prefetch_def2.h.
| #define VKERN_TEMPL_1V_C | ( | FNAME, | |
| OP1 | |||
| ) |
Operations of type VEC OP= VAL.
Definition at line 1261 of file unroll_prefetch_def2.h.
| #define VKERN_TEMPL_1V_CC | ( | FNAME, | |
| OP1 | |||
| ) |
Operations of type VEC *= S OP= VAL.
Definition at line 1288 of file unroll_prefetch_def2.h.
Definition at line 993 of file unroll_prefetch_def2.h.
| #define VKERN_TEMPL_1V_T | ( | FNAME, | |
| OP1, | |||
| TYPE | |||
| ) |
Operations of type TYPE = OP VEC.
Definition at line 1317 of file unroll_prefetch_def2.h.
| #define VKERN_TEMPL_1V_T_LD | ( | FNAME, | |
| OP1, | |||
| TYPE | |||
| ) |
Operations of type TYPE = OP VEC (using LONG_DOUBLE internally)
Definition at line 1347 of file unroll_prefetch_def2.h.
| #define VKERN_TEMPL_2V | ( | FNAME, | |
| OP2 | |||
| ) |
Operations of type vec OP= vec.
Definition at line 1108 of file unroll_prefetch_def2.h.
| #define VKERN_TEMPL_2V_C | ( | FNAME, | |
| OP2 | |||
| ) |
Operations of type VEC = VEC OP VAL or VAL OP VEC.
Definition at line 1137 of file unroll_prefetch_def2.h.
| #define VKERN_TEMPL_2V_CC | ( | FNAME, | |
| OP2 | |||
| ) |
Operations of type VEC = VEC OP VAL or VAL OP VEC.
Definition at line 1168 of file unroll_prefetch_def2.h.
Definition at line 992 of file unroll_prefetch_def2.h.
| #define VKERN_TEMPL_2V_T | ( | FNAME, | |
| OP2, | |||
| TYPE | |||
| ) |
Operations of type TYPE = VEC OP VEC.
Definition at line 1200 of file unroll_prefetch_def2.h.
| #define VKERN_TEMPL_3V | ( | FNAME, | |
| OP3 | |||
| ) |
gcc-2.95.x seems to fail caching a const double& in a REGISTER.
So we have to use a local REGISTER var to force it doing so. for maximum performance. However, this is only beneficial in case we have an elementary type that does fit into a REGISTER. It would be nice to have macros that automatically do it when needed. However, sizeof(T) can't be evaluated by the preprocessor, so we can't know. Instead we use explicit specialization of our templates.Operations of type vec = vec OP vec
Definition at line 1013 of file unroll_prefetch_def2.h.
| #define VKERN_TEMPL_3V_C | ( | FNAME, | |
| OP3 | |||
| ) |
Operations of type vec = vec OP val * vec.
Definition at line 1043 of file unroll_prefetch_def2.h.
| #define VKERN_TEMPL_3V_CC | ( | FNAME, | |
| OP3 | |||
| ) |
Operations of type vec = val * vec OP val * vec.
Definition at line 1075 of file unroll_prefetch_def2.h.
Fragments to be combined for different cases 1,2,3 vector fields 0,1,2 scalars to multiply with variable number of data elements per cacheline 1,2,4,8,16 cachelines ahead prefetch 1,2,4,8 fold unrolling.
The structure is the same, always. (1) Before anything else, start read prefecthing. (2) Unrolled and (both read+write) prefetching loop (3) Unrolled loop (for the elements where prefecthing would be beyond array which could be a performance problem and for write prefecthing maybe a real problem (4) Non-unrolled loop for the remaining elements.
Definition at line 991 of file unroll_prefetch_def2.h.
1.8.5