3.6Parallelizing Compilers/Translators 3.6.1Baseline Languages (TR-1)
Offeror may provide fully supported implementations of Fortran 2003 (ISO/IEC 1539-1:2004, ISO/IEC TR 15580:2001(E), SO/IEC TR 15581:2001(E), ISO/IEC TR 19767:2005(E)) see URL: http://www.nag.co.uk/sc22wg5/IS1539-1_2003.html, C (ANSI/ISO/IEC 9899:1999; ISO/IEC 9899:1999 Cor. 1:2001(E), ISO/IEC 9899:1999 Cor. 2:2004(E)) see URL http://www.open-std.org/jtc1/sc22/wg14/www/standards, and C++ (ANSI/ISO/IEC 14882:1998, ISO/IEC 9945-1:1990/IEEE POSIX 1003.1-1990; ANSI/ISO-IEC 9899-1990 C standard, with support for Amendment 1:1994) see URL: http://www.open-std.org/jtc1/sc22/wg21/docs/standards, and Python Version 3.0 or later as released by http://www.python.org . Fortran03, C, C++ and Python are referred to as the baseline languages. In addition, an assembler may be provided. Offeror may provide the fully supported capability to build programs from a mixture of the baseline languages (i.e., inter-language subprocedure invocation may be supported).
3.6.2Baseline Language Optimizations (TR-1)
Offeror may provide baseline language compilers that perform high levels of optimization that allow the application programmer to utilize of all CN supported hardware features such as SIMD, vectorization, programmable memory prefetch, transactional memory, software managed memory and speculative execution directly in the baseline languages.
3.6.3Baseline Language 64b Pointer Default (TR-1)
Offeror may provide compilers for the baseline languages that are configured with the default mode of producing 64b executables. A 64b executable is one with all virtual memory pointers having 64b. All operating system calls may be available for use by 64b executables. All Offeror supplied libraries may provide 64b objects (versions of the API). Offeror’s supplied software may be fully tested with 64b executables.
3.6.4Baseline Language Standardization Tracking (TR-1)
Offeror may provide a version of the baseline languages that is standard compliant within eighteen months after ANSI or ISO/IEC standardization, whichever occurs earlier. Offeror is encouraged to adhere to the current proposed standard.
3.6.5Common Preprocessor for Baseline Languages (TR-2)
Offeror may provide the capability of preprocessing ANSI C preprocessor directives in programs written in any of the baseline languages.
3.6.6Base Language Interprocedural Analysis (TR-2)
Offeror may provide mechanisms to perform basic interprocedural analysis (e.g., variable cross-reference listing, COMMON block analysis, use/def analysis) for programs written in the baseline languages.
3.6.7Baseline Language Compiler Generated Listings (TR-2)
Offeror may provide baseline language compiler option(s) to produce source code listings that include information such as pseudo-assembly-language listings, optimizations performed and/or inhibitors to those optimizations on a line-by-line, code block-by-code block or loop-by-loop basis as appropriate, and variable types and memory layout.
3.6.8C++ Functionality (TR-2)
Offeror may provide an implementation of the ISO/IEC 14882 C++ standard compiler including: member function templates, partial specialization of classes, partial ordering of functions, name spaces including std::namespace for standard C++ libraries, and default template parameters. Standard C++ library including Standard Template Library and header files without “.h” extensions.
3.6.9Cray Pointer Functionality (TR-2)
Offeror may provide Cray style pointers implemented in an ANSI X3.9-1977 Fortran compliant compiler.
3.6.10Baseline Language Support for the “Livermore Model” (TR-1)
All the proposed baseline languages may support the “Livermore Model” by providing programmers the ability to produce MPI parallel programs that can exploit multiple cores and hardware threads with at least the multiple styles of single node parallelism within the MPI tasks described in the subsections below. These multiple styles of single node parallelism may nest. To efficiently support this nesting of parallel styles, the Offeror’s runtime support may repurpose a fixed number of software threads between the different styles of parallelism with the restriction that only one master thread executes the subroutine call/return between packages written with different styles and the other helper threads call special routines indicating that they can be repurposed. Special hardware and runtime software mechanism are required for efficient implementation of thread repurposing. The overhead associated with repurposing may be less than a subroutine call/return.
Figure 3 8: Unified Nested Node Concurrency.
3.6.10.1Baseline Language Support for OpenMP Parallelism (TR-1)
All the baseline languages (i.e., Fortran03, C, C++ and Python) compilers or interpreters may support node parallelism through OpenMP Version 3.0 or then current directives or language constructs (http://www.openmp.org). As an optimization feature, all the baseline language compilers may perform automatic parallelization. The baseline language compilers may produce symbol tables and any other information required by the debugger to enable debugging of OpenMP parallelized ASC applications.
3.6.10.1.1OpenMP Performance Optimizations (TR-2)
The baseline languages and runtime library support for the CN may include optimizations that minimize the overhead of locks, critical regions and self-scheduling “do-loops” by utilizing special hardware features of the CN hardware.
3.6.10.1.2OpenMP Performance Interface (TR-3)
The baseline languages may implement the portable OpenMP performance interface as specified in the white paper adopted by the OpenMP Forum (see http://www.openmp.org/blog/resources/#White%20Papers; alternatively direct link to the white paper is http://www.compunity.org/futures/omp-api.html). The baseline languages may provide proper decoding and demangling of instructions and identifiers to support the mapping of results back to source code with respect to Fortran03 modules and C++ namespaces.
3.6.10.1.3OpenMP Runtime Efficiency (TR-2)
The proposed OpenMP runtime may be efficiently implemented using special hardware features that accelerate frequent OpenMP operations. The time to execute an OpenMP barrier with NCORE OpenMP threads may be less than 200 clock cycles. The overhead for OpenMP Parallel FOR with NCORE OpenMP threads may be less than 500 cycles in the case of static scheduling.
3.6.10.2Baseline Language Support for POSIX Threads (TR-1)
All the baseline languages may support programming node parallelism through POSIX threads Version 2.0 or then current standard (http://www.opengroup.org/onlinepubs/007908799/xsh/threads.html). The baseline language compilers and/or interpreters may produce symbol tables and any other information required by the debugger to enable debugging of POSIX thread parallelized ASC applications.
3.6.10.3Baseline Language Support for SE/TM (TR-2)
Offeror may propose baseline language support for efficiently and automatically (with the aid of language constructs or compiler directives) exploit any innovative node hardware support for parallel thread execution defined in Sections 2.4.6 and 2.4.7
3.6.11Baseline Language and GNU Interoperability (TR-1)
The baseline language compilers may produce binaries that are compatible with the GNU compilers and loaders. In particular, the delivered baseline compiler OpenMP runtime libraries may be compatible with the GNU OpenMP libraries. That is, a single OpenMP based application can be built, run and debugged using modules generated from both Offeror supplied baseline language compilers and GNU compilers.
3.6.12Runtime GNU Libc Backtrace (TR-2)
The baseline language compilers runtime support may provide the same backtrace functionality that the GNU libc does. Refer to: http://www.gnu.org/software/libc/manual/html_node/Backtraces.html
3.6.13Debugging Optimized Applications (TR-2)
The baseline languages will produce symbol tables and any other information required by the debugger to enable the debugging, in the presence of “-O -g” code optimization, of ASC applications. In particular, the baseline languages will provide a set of command line options that generate sufficient OpenMP optimized code and symbol table information so that the debugger can debug OpenMP threaded applications without loss of information about variables or source code context. Refer to Section 3.7.2.8.
3.6.14Floating Point Exception Handling (TR-2)
The baseline languages will provide compiler flags that allow an application to detect Floating Point Exception (FPE) conditions occurring at runtime within a module compiled with those flags. This support will provide the compiled modules with an option to select any combinations of the floating point exceptions defined for IEEE-754 that include, but are not limited to, overflow, underflow, divided-by-zero, inexact, imprecise, quiet NaN and signaling NaN. With this support enabled, the application will receive a SIGFPE signal whenever a selected floating point exception condition occurs in any of the floating point hardware units (i.e. the main floating point unit and the SIMD unit).
The baseline languages will provide compiler flags that allow for imprecise exception flag setting so that exceptions may be raised by software checking exception flags on subroutine boundaries, block boundaries, loop boundaries and after each floating-point instruction is completed depending on how the compiler flag is set. Each decrease in software exception flag checking resolution will allow the resulting binary to run faster.
Further, the baseline languages will provide a compiler flag to inject 64b signaling NaNs into the heap and stack memory such that an application can easily detect the use of uninitialized memory.
Share with your friends: |