Explanation of what's going on:
Threads are controlled by writing and reading to pipes; a job_input struct is written to the pipe; waiting for the result means reading job_output from the pipe
If you want to use multithreading, you have to do the following:
- Add init_threads(0) to begin and free_threads () to the end of the program
- Create routines that work on parts of the job, only
- Create a routine wrapper which reads the params from the thr_ctrl struct and calls the routine then
- Make the caller divide the job into several parts, call thread_start ()
- Optimization (optional): Make the num of threads dependent on size of object to avoid thread overhead for small objects.
- Maximum number of creatable threads is threads_avail () ! This func will return 0, if there are already threads working to avoid thread recursion and the following deadlock
- Make the caller wait for the job with thread_wait () Alternatively you can use thread_wait_useful () and pass a pointer to a useful_job_t job to do something useful instead of rescheduling while waiting. (Not recommended.)
- Optimization (optional): Make the caller do the last part instead of a thread
- Compile with SMP defined (-DSMP)
- Link with smp.o and -lpthread.
That's it.
See matrix.h: TVector<T> Matrix<T>::operator * (const Vector<T>& v) const for an example.
Debugging:
- If compiling this with THREAD_STAT, you will see a summary of the CPU time the threads used.
- If compiling with THREAD_DEBUG, you will get some debugging messages about CPU detection and thread setup.
- If compiling with DEBUG_THREAD, you will be swamped with thread synchronization debugging messages.