Draft statement of work


Benchmark System Configuration (TR-1)



Download 0.66 Mb.
Page31/34
Date28.01.2017
Size0.66 Mb.
#9693
1   ...   26   27   28   29   30   31   32   33   34

9.2Benchmark System Configuration (TR-1)


The reference benchmark system may be a scaled down version of the proposed Dawn system. The reference benchmark system may contain 1,024 cores at a minimum, with a desired core count of 8,192. The memory per node must be at least 2 GiB. The reference benchmark system used should be fully described as part of the benchmark proposal response. The reference system will require only a modest amount of I/O to run the benchmarks; NFS or equivalent. The marquee benchmarks use a single node to read input, followed by a broadcast to all other nodes. Alternate benchmarking configurations (such as mixtures of current products, future products, and simulators) may be utilized after discussion with LLNS on the benchmarking strategy and relevance of the results to the proposed systems.

The benchmark system should contain the same processors, cache, memory, nodes, interconnects, I/O interfaces, etc., that is proposed for the Dawn system. If this is not possible, benchmark results from an alternative system that meets the conditions specified in the previous paragraph may be reported. The Offeror may also provide estimated scaled performance for the Dawn configuration consistent with the benchmark system configurations as identified in the previous paragraph. All scaling arguments should be fully described by the Offeror in its proposal response and will be reviewed and evaluated by LLNS; supporting documentation may be provided. LLNS will be the sole judge of the validity of any scaled results.


9.3Sequoia Marquee Benchmark Test Procedures (TR-1)


The following procedure has been chosen to directly demonstrate on Sequoia the successful execution of the ASC Program’s highest-level objective for the Sequoia acquisition plan–a key element of ASC’s multi-year platform acquisition plan as part of the Synthetic Workload (SWL) Sequoia system acceptance testing. The 24x Purple sustained, aggregate, weighted FOM demonstration consists of running six “identical problems” for each of the four IDC workload marquee benchmarks on the Sequoia system simultaneously. The 20x BG/L sustained, weighted FOM demonstration consists of running a single LAMMPS run with twenty times as many atoms as the LLNL BG/L LAMPS benchmark. This combined sustained workload run of 25 simultaneous problems from 24 IDC workload and 1 science workload benchmarks will last for four hours.



Figure 9 11: Sequoia target 25 simultaneous problems includes 24 IDC and 1 Science benchmarks.


Because the test problems defined for the benchmarks run in a different length of time, Offerors will run the above under the control of a batch scheduling system that resubmits a new problem to replace finishing problems. The official FOM of each completed run is saved to become part of the final report as described in Section 9.4. When the test has run for four hours, the 25 running problems are terminated without recording a figure of merit. It is planned that each “stream” of 25 problems will generate many FOM results during the total time of the test.

Because the “peak plus sustained” measurement (defined in Section 9.4.2) will be performed while running 25 simultaneous problems, the marquee benchmarks do not test the performance of individual marquee benchmarks running at full machine scale. The size of each problem has been chosen based on the largest problems that can currently be run on the ASC Purple system. In this scenario, the Sequoia system can be thought of as providing “capacity at the Purple capability level”. Larger problems will also be run on Sequoia, and scaling IDC performance beyond the ASC Purple level is a key goal of the multiyear partnership envisioned between ASC/LLNL and the selected Offeror.

The benchmark runs will be made according to the following test procedures. The ASC systems will be primarily used in a high-level language environment. It is the intent of these benchmarks to measure performance of the reference system from this standpoint. Recoding of the benchmarks or portions of the benchmarks in assembly language is prohibited. The use of library routines that currently exist in an Offeror’s supported set of general or scientific libraries, or will be in such a set when the Dawn and Sequoia systems are delivered, is allowed at Offeror’s discretion when they do not specialize or limit the applicability of the benchmark nor violate the measurement goals of the particular benchmark. Source preprocessors, execution profile feedback optimizers, etc. are allowed as long as they are, or will be, available and supported as part of the compilation system for the Dawn and Sequoia systems. All benchmarks will be run in double precision (64b) floating point arithmetic and as 64b executables (64b virtual memory addressing). All benchmarks that use the message-passing programming paradigm will use a supported 64b virtual memory pointer, thread safe communication library that implements the MPI standard. All benchmarks that use the threads programming paradigm will use a supported communication library that implements the OpenMP standard. MPI and OpenMP functionality must be simultaneously usable by single application codes. The required run configurations for each benchmark will be described in the individual benchmark readme files. OpenMP based parallelism should be utilized to the extent possible on each node. Each node will be a set of cores sharing random access memory within the same memory address space.

Changes to accommodate unique hardware and software characteristics of a system that are consistent with the preceding paragraph will be allowed except where specifically prohibited in the constraints for each benchmark. Code modifications will be documented in the form of initial and final source files, with mandatory accompanying text describing the changes. An audit trail will be supplied to LLNS for any changes made to the benchmark codes. The audit trail will be sufficient for LLNS to determine that changes made violate neither the spirit of the benchmark nor the specific restrictions on the various benchmark codes. LLNS requires that all benchmark codes first be run as provided, without any code modifications, in each required configuration and that these baseline results be included along with any results obtained from modified code. Further discussion of the value to ASC/LLNS of specific types of modification can be found below in Section 9.4.1.

The specific problems to be run during the sustained performance test are defined as follows:

AMGThe mesh refinement factors rx, ry, and rz should all be set to 6. For the six simultaneous runs, three should be made using solver 3 and three should be used made using solver 4.

SPhot – Nruns should be set to 262,144.

UMT – Run C in the RFP benchmark problem set.

IRS – Standard problem with 25 zones per domain side. For runs using Open MP, the number of domains per MPI task may be increased to equal the number of OpenMP threads per MPI task.

LAMMPS – Standard LAMMPS EAM potential benchmark with at least 83,886,080,000 atoms and a target of 32,000 atoms per MPI task. The Offeror may not decrease the total number of atoms in the benchmark, but may adjust the number of atoms per MPI task in order to reduce the number of MPI tasks in the benchmark run in order to allow all 24 of the IDC codes to run.


Download 0.66 Mb.

Share with your friends:
1   ...   26   27   28   29   30   31   32   33   34




The database is protected by copyright ©ininet.org 2024
send message

    Main page