$Id: BUGS,v 1.86 2013/12/20 22:30:21 ruiliu Exp $

This file documents currently-known bugs and limitations within PerfSuite as of
the date of this release.  If you discover a problem that isn't described here,
please report to the general mailing list described in the installation
instructions.

When built with external software, PerfSuite relies on correct behavior of
each, so bugs may also arise from the supporting layers as well.  In this case,
the support mechanisms described at each project's website are additional
resources that can be helpful.  Please refer to the PerfSuite installation
instructions for pointers to these resources.

Known bugs/limitations in PerfSuite
===================================
Updated for version 1.1.4

psconfig
--------
Symptom: Error message "Can't find usable tk.tcl" appears when psconfig is
         started.
Status:  This problem does not arise from PerfSuite, but from an improper
         installation of Tcl/Tk on your system.  If you can locate the file
         "tk.tcl" using commands such as "find" or "whereis", then you can
         use the environment variable TK_LIBRARY to address this error.
         Example (assuming "tk.tcl" is in the directory "/usr/share/tk8.4"):

         % setenv TK_LIBRARY /usr/share/tk8.4 
         % psconfig

psenv
-----
Symptom: PerfSuite tools and utilities cannot find libperfctr.so.X on x86-based
         systems.
Status:  This is because libpapi.so is not built such that the location of this
         dependent library is included.  You can either add the directory where
         your perfctr libraries are located to your LD_LIBRARY_PATH environment
         variable or arrange for it to be added to the standard runtime linker
         search path.  See the manual page for "ldconfig" for more information.

Symptom: The "man" command cannot find standard manual pages after the psenv
         scripts have been sourced.
Status:  If no prior MANPATH setting exists in your environment, psenv will set
         it to the PerfSuite manual page directories only.  You can edit the
         installed psenv.{sh,csh} scripts to include additional directories
         that you need.

psprocess
---------
Symptom: "psprocess -H" may fail at runtime, possibly generating an error
         message from glibc about an error in memory deallocation.  This has
         been observed on an x86-64 system with tDOM 0.8.2.
Status:  Reproducing this error under gdb indicated that the problem comes from
         the <xslt:sort> transformation contained in the XSL stylesheet used
         to produce the HTML.  The workaround is to comment out the (single)
         call to <xslt:sort> in the stylesheet, which can be found after
         installation in $PREFIX/share/perfsuite/xml/pshwpc/pshwpc.xsl.  We are
         not commenting this out by default because displaying the event counts
         in order, sorted by name, is useful and works in most cases/systems.

Symptom: psprocess produces differing results when run in different filesystem
         contexts (example: AFS vs. local filesystem).  User reports filed:
         http://sf.net/mailarchive/forum.php?thread_id=10198810&forum_id=39162
         http://sf.net/mailarchive/forum.php?thread_id=10421594&forum_id=39162
Status:  Cause unknown, current workaround is to try alternate filesystems.

Symptom: psprocess produces differing results when processing profiling reports
         and piping or redirecting the output.  Appears to be sensitive to
         presence of debugging information in executable (that is, compilation
         without -g triggers problem more frequently).
Status:  Cause is unidentified.  In response to this, the environment variable
         PSPROCESS_MAPPER was added in PerfSuite version 0.6.2a3.  Setting
         this variable to the value "addr2line" forces the use of the addr2line
         utility instead of the PerfSuite BFD Tcl extension and may produce
         more consistent results, although execution speed will be slower.

Symptom: psprocess can crash, depending on the installation and software
         environment.  The crash occurs immediately after output of the counter
         data but before derived metrics/statistics are displayed.
Status:  The cause of this is unknown, but a workaround has been implemented
         where a flush of the output channel can be forced by setting the
         environment variable PSPROCESS_FLUSH to any value.  Setting this
         variable should not be necessary unless you are experiencing psprocess
         crashes at this point in your output.

Symptom: Dynamically-loaded shared libraries do not appear in psprocess
         profiling output.
Status:  This is a deficiency/limitation of the underlying library, libpshwpc.
         See the section on that library below.

psrun
-----
Symptom: Application runs to completion and XML document(s) created but an
         address error occurs at the very end of execution.
Status:  The address error appears to be coming from a call to strcmp() during
         finalization of dynamically loaded shared libraries.  This does not
         appear to affect the execution of the program or the performance
         monitoring output, but does result in this address error.  Error
         handling was added to psrun in version 0.6.1b4 that attempts to guard
         against these errors: first, SIGSEGV is caught and ignored, second,
         core dump limit is set to zero in the wrapup routine.  These
         modifications are being monitored to test their effectiveness in
         dealing with this problem.

Symptom: POSIX threads application in which the initial thread is terminated by
         calling pthread_exit() rather than exit() generates an error message
         "calling sequence not allowed" at the end of execution.  Output XML
         document is not created for the main program/thread.
Status:  This is the expected behavior of the underlying library.  If possible,
         modify the application to call exit() instead of pthread_exit() within
         the initial thread.  Alternatively, libpshwpc could be modified and
         rebuilt to avoid this error within ps_hwpc_stop() (note: this has not
         been tried or tested).

Symptom: An extra XML output document is created when OpenMP programs compiled
         by the Intel compilers are used with psrun.
Status:  The Intel OpenMP implementation creates an extra "monitor thread" that
         is measured like all other threads by psrun.  This should not affect
         the execution of your program, but you may want to exclude the
         performance data corresponding to that thread for your performance
         analysis.  The monitor thread's ID is usually one of the lower
         numbers, but not necessarily thread 0.  As an alternative, you can
         insert calls directly to libpshwpc in your application to avoid
         creating the additional output document.  If you're using the Intel
         compiler on a system without the NPTL threads library, you may also be
         able to suppress XML output from the monitor thread by setting the
         environment variable PS_HWPC_TIME to a small integer (this variable
         specifies the minimum number of CPU seconds that a process must
         consume before the output document will be generated).  One way to
         tell if you're using NPTL is to examine the process ID for each
         thread - NPTL will assign the same process ID to all threads within a
         process.

Symptom: psrun does not work properly with OpenMP programs that use the
         Portland Group compilers.
Status:  This is due to the fact that PGI OpenMP support is not based on POSIX
         threads.  A workaround is to insert calls directly to libpshwpc in
         your application.

Symptom: psrun fails when used with shell scripts.
Status:  psrun and its underlying libraries are meant to be used with compiled
         executables, not shell scripts.  It is possible that you may have luck
         using an alternate XML configuration document (for example,
         profil()-based configurations) but behavior is not guaranteed.
         Further, certain shells such as tcsh exit via the _exit() call, not
         exit() - this defeats the ability of psrun to execute the wrapup code
         necessary to write out performance data before the process exits.  If
         you're interested in monitoring individual commands contained within a
         shell script, try inserting psrun within the script. Another option is
         to use the exclusion database feature added in PerfSuite 0.6.1 beta 6:
         this is done by creating a flat file containing the full path names
         of executables that should not be monitored by psrun.  The database is
         named in the environment variable PS_HWPC_EXCLUDEDB.  By setting this
         variable and using psrun -f on the script, you may be able to obtain
         performance data from the individual commands contained within the
         script (this feature is new and under testing).

Symptom: psrun fails when used with MPICH ch_p4
Status:  This occurs because the mechanisms used to launch MPI tasks change the
         command-line arguments supplied and because of this, psrun does not
         have access to the name of the executable that is to be measured.
         There is no clean way to address this from within psrun, but you can
         relink your program with the PMPI-based libpshwpc_mpi to achieve
         similar results.

Symptom: psrun exits and an "assert" error is output indicating
         "mpx_handler: Assertion `retval == 0' failed".
Status:  This has been observed on a Red Hat 9.0 system and may be related to
         the POSIX threads library.  Try running again but first set the
         environment variable LD_ASSUME_KERNEL to the value "2.4.1".  The
         assertion error is coming from within PAPI's multiplexing support.

Symptom: Dynamically-loaded shared libraries do not appear in psrun profiling
         output.
Status:  This is a deficiency/limitation of the underlying library, libpshwpc.
         See the section on that library below.

Symptom: psrun resource reporting for POSIX threads program shows little or no
         CPU time used.
Status:  psrun's resource monitor thread samples the initial thread created by
         the executable. With LinuxThreads, this is a "manager thread" that
         consumes very little CPU time.  You should use the thread-specific
         output documents for their report of CPU time consumed by individual
         threads.

Symptom: psrun writes an XML resource document if requested, even if the
         application to be measured cannot be executed.
Status:  Although infrequent, this can happen if the monitor thread samples the
         child process forked by psrun before it has been detected that the
         target program could not be executed.  The resource document contains
         meaningless data and should be ignored.

libperfsuite
------------
none

libpshwpc
---------
Symptom: Applications that use the interval timer signal ITIMER_PROF do not
         execute properly.
Status:  This is usually caused by dual use of the signal by the application
         and PAPI's multiplexing software.  There is nothing that can be done
         in this case except to limit your use of configuration files to those
         that do not require multiplexing of the hardware counters.

Symptom: Dynamically-loaded shared libraries do not appear in libpshwpc
         profiling output.
Status:  libpshwpc determines the process address ranges that are to be
         profiled at initialization (ps_hwpc_init).  These ranges stay fixed
         throughout the performance measurement.  There is no workaround for
         this limitation unless you can do one of the following:
         + relink your program, explicitly including the shared libraries that
           will be "dlopen"ed by your program at runtime.
           Example: gcc -o mydlprog mydlprog.c -llibrary_to_be_dlopened
         + arrange for static linking of the libraries involved
         + defer initialization until the required libraries have been opened
           and loaded from within your program (when using the libpshwpc API
           explicitly)
         + preload libraries that you want included in the profile (if you know
           what they are) using the LD_PRELOAD environment variable.
           Example: LD_PRELOAD=/usr/lib/libmydllib.so mydlprog
                        (or)
                    LD_PRELOAD=/usr/lib/libmydllib.so psrun mydlprog
           If you have multiple libraries you'd like to include, you should
           separate them with whitespace, e.g. LD_PRELOAD="liba.so libb.so".
           There is an example script in the PerfSuite "misc" example directory
           called "getsolibs" that might help you find out what shared 
           libraries are used by your program.
       
Symptom: PAPI-to-PerfSuite conversion routines report a counting domain of
         "unknown".
Status:  This is a problem that was resolved in PAPI 3.0.7.  You can upgrade
         your version of PAPI to address this problem.

Symptom: PAPI-to-PerfSuite conversion routines always report user and system
         CPU time as -1 in output XML documents.
Status:  Will be fixed in a later release so that the caller can supply the
         proper times to use for those XML elements.

libpshwpc_mpi
-------------
none

Miscellaneous
-------------
Symptom: A POSIX threads program that is measured using the profil() function
         (for example, profil.xml in the PerfSuite configuration directory)
         exits immediately with no performance data.
Status:  This is a limitation of the library function profil(), not PerfSuite.
         To perform time-based profiling in a manner similar to profil()/gprof,
         try using a configuration file that profiles based on total cycles,
         with a threshold tailored to the clock speed of your computer
         equivalent to 1 or 10 ms (the default sampling periods for profil()).
         There is a sample configuration file that you can use included in the
         PerfSuite distribution that is called "papi_profile_cycles.xml".  You
         can replace the value for the XML attribute "threshold" in this file
         with the clock speed of your computer divided by the clock ticks per
         second. For example, if the clock speed of your computer is 2.08 GHz
         and the OS clock ticks per second is 100, set threshold to the value
         "20800000".  There is an example program in the distribution (look in
         the "misc" subdirectory) called profile_rate.c that will suggest a
         threshold for you.
