|
TBCI Numerical high perf. C++ Library
2.8.0
|
The TBCI library does support using multiple CPUs to share the work on data.
For this the loops over data sets are split into several pieces and handed to multiple CPUs. To do this, posix threads are used. So this only works on multiprocessor shared-memory (or CC-NUMA) machines; for using network clusters, a communication mechanism like MPI (or PVM) would need to be used.
However, for the kind of lowlevel operations that are parallelized this way in TBCI, using clusters will not perform well.
Despite having invested significant effort into optimizing multithread performance (e.g. by prestarting threads and only waking them up on request), the parallel work of multiple CPUs only really scales well for large data sets. Data that fits in the L1 cache of your CPU is definitely NOT worth spreading across multiple CPUs.
The TBCI library has a heuristics that does not try to use multiple threads for small data sets (that would fit in L1); the heuristics is currently based on a fixed vector size rather than real data size.
If you want to use the support for multiprocessing in TBCI, compile with the SMP preprocessor symbol defined (or –enable-smp in configure.). Call init_threads(i) at the beginning of your program and free_threads() at the end. When calling init_threads() you can let the library self-determine how many CPUs are available, you can set the maximum number or specify the number explicitly. During runtime, you can always disable multithreading by calling disable_threads() and reenable with reenable_threads().
1.8.5