TO DO
=====
List of things to be done on the TBCI library.

2  The Makefiles should be generated with autoconf.
   -> First part of this has been donated by Jan van Dijk.
   -> Second part finished (all the tests for compiler features)
   -> Third part (--enable-XXX options to be done for 2.6.0)
3  I'd like some C++ guru to walk through the code, understand it and clean
   it up. A lot of code has grown historically and may still carry some
   unnecessary crap within it.
   - too many friends
   - duplicated code (which could be inherited?)
   - ifdef'ed things used for testing
   - use trait classes as signatures ... (Jan?)
5  Add PVM or MPI support. Replace pthread SMP support if suitable. 
   -> 3.0/4.0?
6  SMP (or MPI/PVM) support could be done on a higher level in solvers
   -> Postponed to 2.5
7  Better tune SMP functions to avoid cache line ping-pong ...
   -> Partially done
8  Support writing of objects to disk and reloading by mmap()ing them.
   -> 3.0
9  Redesign of the exception handling structure. Use exception classes 
   indicating the type of error rather than (or maybe in addition to)
   the class in which it occured, i.e. have DivByZeroErr, BadAllocErr,
   BoundChkErr exception classes instead of VecErr, MatErr, ...
   -> 3.0
10 Alternative parallelization support by using Active Expressions? [5]
   -> 3.0 ??
11 Optimized solvers for CRMatrix, CSCMatrix, Symm_BdMatrix
13 More example applications
   Ideally, some testsuite for regression testing.
   At this moment, the library compilation, the benchmarks and testcases
   in the lina/test dir need to do the job. 
   (I do additionally tests by running my EM simu apps and PLASIMO ...)
15 Better documentation ;-)
16 Finish doxygen documentation within the code.
18 Implement new TBCI design as shown in lina/test/test_tbci_scale.cc.
   (This would render 3 mostly unnecessary.) -> libtbci-3 ...
22 Use traits to describe the features of a type. This way, we could easily
   decide between memcpy() or a for() loop for copying data.
   -> 3.0
24 Iterators -> 3.0
26 While the TBCI concept proves powerful to avoid temporaries (and to avoid
   unecessary copying), the defering of operations may better be done using
   Expression templates? -> 3.0? 4.0?
29 Look at doxygen warnings.
30 Try to fix MIPSpro compilation (doubly emitted BVector<unsigned>
   operator == (const BVector<unsigned>&)const  and cplx.h stuff.
   -> Turned out to be a problem with the MIPSpro's ii files. You may
      need to manually edit/delete them.
32 Split off the non-TBCI stuff from the TBCI dirs: mathplus, specfun.
   Put into a different namespace and library (tbcimisc). 
   Think about my_nr, LM_fit. (tbcialgo), Solvers (tbcisolver)?
33 Change dir structure to resemble the one from the installed RPM
   -> headers in a dir named tbci. Project dir also tbci?
35 Provide a smp lib apart from tbci.
36 Try to use clone() for thread creation and pipes for thread sync ...
   ... and check whether SMP performance gets better.
   (pthreads are rather heavy)
37 Store the version in some header which get installed and included.
38 SVD breaks on complex numbers.
39 Clean up mem alloc, copying, comparison for all classes
   (malloc -> NEW, free -> FREE, memcpy -> TBCICOPY, memcmp -> TBCICOMP,
    memset -> ?) Done for (B)Vectors, Matrices, F_Matrices, and BdMatrices 
    so far.
40 Add RESTRICT macro (keyword) where appropriate
   -> partially done in 2.4
41 Look into OpenMP; add OpenMP pragmas where appropriate
42 Add the samples from PLASIMO
43 More parallelization of Vectors: unary -, conj(), emul, ==, ...
44 Scalability studies on different archs (partically DONE)
45 AZTEC interface (?)
46 Accessors as a view on data (3.0?)
48 Maybe work around with typedefs is better than the ugly FRIEND_TBCI__
   (and the changed position of the keyword friend).
50 Use traits to differentiate between temp, aliased and real objects.
   ->3.0
   
51 Clarify operator / on vectors and matrices -> 2.4.2.
53 Nowadays compilers do support Koenig lookup. Make use of it and 
   avoid putting TBCI functions into std::

54 Investigate Swing-Modulo-Scheduling (-fmodulo-sched) with gcc4.
55 See if we can profit from using hidden visibility at some places.
56 The __restrict__ extension in class member functions may be used to
   indicate this not being aliased.

57 Test openMP support and evaluate performance compared to our explicit
   worker thread SMP model
58 Investigate finetuning for Core2 in perf_opt.h; adapt get_cpu and
   Make.Sys accordingly
59 Use -march/tune=native from gcc-4.2
60 Improve SMP_*SLICE size determination (partially DONE)
61 Investigate Strassen's algo for matrix multiplication
62 Smaller struct thr_ctl -- we should be able to do with one mutex/cond pair
   and save useful cachelines that bounce around (DONE)
63 Alternative way of starting threads -- writing to pipes (DONE) 
64 SIMD instructions for compare
65 Kahan sums
66 NUMA optimizations 1: Bind threads to where the memory is ... (DONE)
67 NUMA optimizations 2: Move pages from large objects to the right node
   ...

68 Split basics.h into a set of definitions (always needed) and helpers/utils
   that implement some basic repeatedly needed things (such as tbcicopy ...).


.. and there are many more. Do whatever you think is useful. You may post
suggestions to the list before actually working on it, to make sure the work
is not done twice.

(See README file for references)
									Kurt
