Goal: Integrate the TAU performance system with the dynamic instrumentation capabilities offered by DyninstAPI. Enable TAU performance measurement on the Compaq Alpha Cluster. Improve PDT program analysis system for Fortran 90 instrumentation.
INSTR-1: Develop dynamic TAU performance measurement mechanisms for MPI using DyninstAPI.
Status: Complete. We implemented a technique which spawns a Dyninst mutator with each MPI generated executable image. The mutator inserts TAU instrumentation in the executable before starting the MPI process and then waits for the child process to terminate. (This is similar to the approach used in Dynaprof.) We demonstrated this capability with the SIMPLE hydrodynamics benchmark in our PDPTA ’01 paper . TAU v2.11 ships with support for DyninstAPI and MPI.
INSTR-2: Port the TAU performance measurement system to Compaq Alpha Cluster and demonstrate with MPI applications.
Status: Complete. TAU supports Compaq (cxx, f90) and KAI (KCC, KAP/Pro) compilers under Tru64. TAU also supports Compaq Linux clusters. This capability has been demonstrated with the SAMRAI (Andy Wissinsk, LLNL) and SAGE (Jack Horner, LANL) projects.
INSTR-3: Complete PDT F90 implementation.
Status: Complete. TAU’s PDT system now supports F90 as well as C99 and C++. The PDT F90 front end has been validated on F90 test suites from the University of Colorado ELI project and the PCRC HPF compiler project. A total of 309 programs were tested with no errors reported.
INSTR-4: Develop tool for automatic source-level F90 instrumentation and demonstrate on F90 application code.
Status: Complete. The PDT F90 capability has been used to build F90 instrumentation support for TAU. We have tested the instrumentor partly on the SAGE code and the POP code (Phil Jones, LANL), and more extensively in the Caltech CACR ASCI/ASAP VTF project (Julian Cummings). In addition to its use in the TAU F90 instrumentor, the PDT F90 capability is being used in the CHASM  project (Craig Rasmussen, LANL).
INSTR-1 is specifically for MPI only, not in conjunction with threads.
Status: Complete. TAU is available for use with the UPS system for both profile-based and trace-based measurements with hardware performance monitoring capabilities. A significant accomplishment to make TAU’s integration possible was the development of an automatic C instrumentor. This allow full automatic instrumentation of UPS library source code.
UPS-2: Validate UPS/TAU performance measurement system on UPS-targeted ASCI platforms using UPS validation benchmarks.
Status: Complete. TAU’s use with UPS on a UPS validation code was demonstrated to Richard Barrett and Federico Bassetti at the LACSI Symposium 2001. Future work with Mike McKay at LANL will be aimed towards a performance study of UPS using TAU’s measurement support.
Multithreading and Hybrid Parallelism
Goal: Apply TAU in multithreaded C++ and OpenMP programming environments and develop enhancements for hybrid (“mixed-mode”) parallel execution based on MPI.
APP-1: Demonstrate TAU's ability to profile and trace example application codes developed with the Overture framework.
Status: Complete. TAU is integrated with the Overture and AMRSim frameworks (Brian Miller, CASC, LLNL); see Miller’s PDPTA ’01 paper . PDT is also being used in these projects. Our work with the Overture and AMRSim frameworks is continuing.
APP-2: Port TAU to multithreaded OpenMP environments, targeting the KAI KAP/Pro OpenMP compiler in particular, and interact with OpenMP application developers in its use.
Status: Complete. TAU supports OpenMP programming environments in two forms: OpenMP runtime system routine instrumentation and OpenMP source transformation. The latter method is implemented using the Opari OpenMP directive rewriting tool of Bernd Mohr, ZAM/FZJ, Germany. This work was reported in our EWOMP ’01  and LACSI ’01  papers. TAU supports KAI’s KAP/Pro, SGI, IBM, Compaq, and PGI OpenMP compiler suites.
APP-3: Specify OpenMP runtime system “hooks” that OpenMP compiler vendors might provide that could be used effectively by TAU for performance measurement.
Status: Complete. In association with Bernd Mohr, we defined the POMP performance interface for OpenMP; see LACSI ’01 paper. POMP and Opari were demonstrated with both the TAU and EXPERT performance measurement and analysis systems. The POMP specification has been presented to the OpenMP Future Committee and ARB. Current work is underway to merge the OMPI performance interface defined by the INTONE project with POMP. We have also been closely involved with KAI on the OpenMP performance interface specification  for the ASCI Path Forward, Ultrascale Tools Initiative, RTS – Parallel Systems Performance project. An ASCI report was jointly-authored with KAI on the OpenMP performance tool interface. KAI is working on an implementation of the POMP interface.
APP-4: Enhance TAU for use in C++/MPI and OpenMP/MPI (OpenMPI) hybrid parallel execution environments and demonstrate on selected applications.
Status: Complete. TAU now supports several hybrid execution, including C++/MPI , OpenMP/MPI, and even Java/MPI. Multi-level instrumentation is applied (Sameer Shende, Ph.D. thesis ) using PDT source instrumentation, MPI wrapper library instrumentation, and POMP/Opari for OpenMP instrumentation. We have presented this work in our SC ’01 tutorial . C++/MPI hybrid performance measurement is also used in our work with the University of Utah ASCI/ASAP C-SAFE project (Chris Johnson and Steve Parker). This work will be published in ISHPC ’02 .
Status: Complete. We continued to support requests from the POOMA development team. In particular, on Jeffrey Oldham’s (CodeSourcery, LLC) recommendation, TAU’s PDT instrumentor was extended to support selective instrumentation capabilities. A –noinline option was added to suppress instrumentation of inlined procedures. TAU and PDT are available for download from the POOMA webpage.
S. Shende, A. Malony, and R. Ansell-Bell, "Instrumentation and Measurement Strategies for Flexible and Portable Empirical Performance Evaluation," Proc. Int'l. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001), June 2001.
C. Rasmussen, K. Lindlan, B. Mohr, J. Striegnitz, "CHASM: Static Analysis and Automatic Code Generation for Improved Fortran 90 and C++ Interoperability," Proc. Los Alamos Computer Science (LACSI) Symp. 2001, Oct. 2001.
B. Miller, B. Phillip, D. Quinlan, and A. Wissink, "AMRSim: An Object-oriented Performance Simulator for Parallel Adaptive Mesh Refinement," Proc. Int'l. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001), June 2001.
B. Mohr, A. Malony, S. Shende, and F. Wolf, "Towards a Performance Tool Interface for OpenMP: An Approach Based on Directive Rewriting," Proc. Third European Workshop on OpenMP (EWOMP 2001), Sept. 2001.
A. Malony, B. Mohr, S. Shende, and F. Wolf, "Design and Prototype of a Performance Tool Interface for OpenMP," Proc. Los Alamos Computer Science (LACSI) Symp. 2001, Oct. 2001.
B. Kuhn, A. Malony, B. Mohr, and S. Shende, "A Performance Tool Interface for OpenMP," Report for Accelerated Strategic Computing Initiative (ASCI), ASCI Path Forward program, Ultrascale Tools Initiative, RTS - Parallel System Performance, submitted by KAI Software, A Division of Intel America, Inc., Aug. 2001.
S. Shende, "The Role of Instrumentation and Mapping in Performance Measurement," Ph.D. Dissertation, University of Oregon, Aug. 2001.
A. Malony, B. Mohr, and S. Shende, "Performance Technology for Complex Parallel Systems," SC 2001 tutorial, Nov. 2001.
D. St. Germain, A. Morris, S. Parker, A. Malony, and S. Shende, "Integrating Performance Analysis in the Uintah Software Development Cycle," Int'l. Symp. on High Performance Computing (ISHPC-IV), May 2002.