Draft statement of work


(4.12) Dawn Hardware Options



Download 0.66 Mb.
Page21/34
Date28.01.2017
Size0.66 Mb.
#9693
1   ...   17   18   19   20   21   22   23   24   ...   34

4.3(4.12) Dawn Hardware Options


This section superceeds Section 2.12.

Offeror may propose each of the following TOs, as separately priced options. Offeror may technically describe, in the following sections of its technical proposal(s), how the options will be effected, if exercised by LLNS.


4.3.1(4.12.1) Dawn Enhanced IO Subsystem (TO-1)


Offeror may propose an enhanced IO subsystem for Dawn that provides for double the baseline IO performance for jobs spanning 50% of the machine and 25% of the compute nodes. That is, the enhanced IO subsystem proposed may deliver at least 100% of the full system IO delivered bandwidth to jobs using 100% of the CN and may achieve 100% of the full system IO delivered bandwidth for jobs using 50% of the CN and may achieve 50% of the full system IO delivered bandwidth for jobs using 25% of the CN.

4.3.2 (4.12.2) Dawn Double Memory (TO-1)


Offeror may propose Dawn CN with double the memory of the baseline Dawn system. In this option, the ION/LN memory may be remain consistent with Section 4.3. That is, the memory size component scaling B:F ratio for this CN (only) memory option may meet or exceed:

Memory Size (Byte:FLOP/s)  0.6


4.3.3(4.12.2) Dawn Double ION/LN Memory (TO-2)


Offeror may propose Dawn ION/LN with double the memory of the baseline Dawn system. That is, the memory size component scaling B:F ratio for this ION/LN (only) memory option may meet or exceed:

Memory Size (Byte:FLOP/s)  0.6

End of Section 4

5.0Dawn High Level Software Requirements


All of the Sequoia Software requirements (Section 3) apply to the Dawn system(s). The following requirements supercede the corresponding requirements in Section 3.

5.6.10.1 Baseline Language Support for OpenMP Parallelism (TR-1)

All the baseline languages (i.e., Fortran03, C, C++ and Python) compilers or interpreters may support node parallelism through OpenMP Version 2.5 directives or language constructs (http://www.openmp.org/drupal/mp-documents/spec25.pdf). As an optimization feature, all the baseline language compilers may perform automatic parallelization. The baseline language compilers may produce symbol tables and any other information required by the debugger to enable debugging of OpenMP parallelized ASC applications.

End of Section 5


6.0Integrated System Features (TR-1)


The following requirements deal with the functional aspects of the integrated Dawn and Sequoia systems. Both Dawn and Sequoia are intended for classified production usage at LLNL in the Secure Computing Facility (SCF) by the ASC and Stockpile Stewardship Tri-Laboratory Communities. LLNS therefore requires that the Dawn and Sequoia systems have highly effective, scalable RAS features and prompt hardware and software maintenance.

For hardware maintenance, the strategy is that LLNS personnel will provide on-site, on-call 24x7 hardware failure response. LLNS envisions that these hardware technicians and system administrators will be trained by the selected Offeror to perform on-site service on the delivered hardware. For easily diagnosable node problems, LLNS personnel will perform repair actions in-situ by replacing Field Replaceable Units (FRUs). For harder to diagnose problems, LLNS personnel will swap out the failing node(s) with on-site hot spare node(s) and perform diagnosis and repair actions in the separate Hot-Spare Cluster (HSC). Failing FRUs or nodes (except for writable nonvolatile media) will be returned to the Offeror for replacement. Hard Disks FRUs and writeable nonvolatile media (e.g., EEPROM) from other FRUs will be destroyed by LLNS according to DOE/NNSA computer security orders. Thus, LLNS requires an on-site parts cache of all FRUs and a small system of fully functional hot-spare nodes of each node type. The Offeror will work with LLNS to diagnose hardware problems (either remotely or on-site, as appropriate). On occasions, when systematic problems with the cluster are found, the selected Offeror’s personnel will augment LLNS personnel in diagnosing the problem and performing repair actions.

In order for the Dawn and Sequoia systems to fulfill the mission of providing “capability” computing resources for LLNS, they must be highly stable and reliable from both a hardware and software perspective. The number of failing components per unit time (weekly) should be kept to a minimum. System components should be fully tested and burned in before delivery (initially and as FRU or hot-spare node replacement). In addition, in order to minimize the impact of failing parts, LLNS community must have the ability to quickly diagnose problems and perform repair actions. A comprehensive set of diagnostics that are actually capable of exposing and diagnosing problems are required. It has been LLNS’ experience that this is a difficult but achievable goal, and the selected Offeror will need to specifically apply sufficient resources to accomplish it.

For software, the strategy is similar to the hardware strategy in that LLNS personnel will perform the Level 1 (initial call, routine questions and answers, routine software documentation) and Level 2 (routine bug fix, detailed questions and answers, detailed software documentation) software support functions. Specifically, LLNS personnel will diagnose software bugs to determine the failing component. The problem will be handed off to the appropriate LLNS organization for resolution. For LLNS supplied system tools, LLNS personnel will fix the bugs. For Offeror-supplied system tools, the selected Offeror will need to supply problem resolution. For the Linux kernel and associated utilities, LLNS intends to separately subcontract with Red Hat for Enterprise level support. For file system related SW problems, the LLNS intends to separately subcontract with Sun Microsystems for Lustre support. For compilers, debuggers and application performance analysis tools, LLNS intends to separately subcontract with the appropriate vendors for support.



This software support strategy depends on all software components being Open Source and source code available to LLNS for viewing, modification, compilation and execution on the provided systems. It is absolutely necessary that the selected Offeror provide LLNS any unique development environment components required to reproduce from source code any portion of the Dawn or Sequoia software environment, except for compilers and runtime support. Any bug fixes developed by LLNS personnel will be provided back to the selected Offeror. If Offeror proposed system components are not Open Source, then full source code software licenses that allow LLNS to perform these support functions is required.

Download 0.66 Mb.

Share with your friends:
1   ...   17   18   19   20   21   22   23   24   ...   34




The database is protected by copyright ©ininet.org 2024
send message

    Main page