Draft statement of work


Input/Output Subsystem (TR-1)



Download 0.66 Mb.
Page12/34
Date28.01.2017
Size0.66 Mb.
#9693
1   ...   8   9   10   11   12   13   14   15   ...   34

2.9Input/Output Subsystem (TR-1)


The Input/Output subsystem for Sequoia has two major components: 1) an Offeror provided interface to the LLNS provided SAN; and 2) local RAID storage for LN and SN. User login and other external network TCP/IP services access to LN are supported over the SAN interface on the LN. External network TCP/IP services access to CN are supported by the SAN interfaces on the ION. By the time Sequoia is delivered, Infiniband™ SAN and attached storage and networking solutions should be widely available. This solution is highly desired because it offers the opportunity to share disk and external networking resources between multiple platforms within the Livermore Computing High Performance SAN environment (e.g., capacity computing clusters, data manipulation engines, visualization engines, archival storage). The architectural picture Figure 1 -5 shows the preferred system layout for the I/O subsystem.



Figure 2 7: Offeror provided IO Subsystem components include SAN interface and local RAID storage for LN and SN.

2.9.1File IO Subsystem Performance (TR-1)


Offeror may propose sufficient ION and SAN interfaces to provide 100% of the required delivered Lustre IO bandwidth (as defined in Section 2.2.1) for jobs running on 100% of the CN and to provide 50% of the required delivered Lustre IO bandwidth for jobs running on 50% of the CN using 50% of the ION and 25% of the required delivered Lustre IO bandwidth for jobs running on 25% of the CN using 25% of the ION. Note that Section 2.12.1 requires an option that doubles the delivered bandwidth to applications using only a portion of the compute nodes. These performance numbers may be measured with IOR_POSIX over a minimum of 8 hour test period. IOR_POSIX may be configured for writing and then reading files from Lustre using standard POSIX IO calls under the following benchmarking conditions:

Launch: IOR_POSIX may have one MPI task per node. Number of threads within the MPI task can be changed to maximize delivered IO performance.

Create: each IOR_POSIX MPI task may create one file with zero size.

Write: each MPI task may write 35% of node memory size data to the file and close the file.

Verify: each MPI task may open and read in all data from the file of another MPI task (shift) and verify the data was written correctly and close the file.

Read: Each MPI task may open and read the data in the file it originally wrote and close and delete the file.

Terminate: IOR_POSIX job will terminate.

Each run of IOR_POSIX may execute steps 2-5 above 4 times. IOR_POSIX prints out the read and write rate for each iteration. The figure of merit for IOR_POSIX is the minimum of the read rate and write rate. The figure of merit for IOR_POSIX file IO subsystem performance is the minimum of the four read and four write rates. The overall file IO subsystem I/O rate (Rp) is defined as



Where N is the number of IOR_POSIX runs completed in the 8 hour test period.


2.9.1.1File IO Function Ship Performance (TR-1)


Offeror provided hardware and software may deliver at least 95% of the SAN interface peak link unidirectional bandwidth (0.95*3.2 GB/s = 3.04 GB/s per IBA 4x QDR interface) to a user application performing file IO running on all CN associated with an ION to a Linux tempfs file system using 50% of the ION memory. This may be measured with the “IOR_POSIX” benchmark or equivalent running on each CN utilizing one file per IOR_POSIX instance in stonewalling mode. Note that the number of IOR_POSIX instances per CN is not specified, but all output files must be of the same size and fit on the tmpfs. These performance numbers may be measured with IOR_POSIX over a minimum of 30 minute test period. IOR_POSIX may be configured for writing and then reading files from ION tmpfs using standard POSIX IO calls under the following benchmarking conditions:

Launch: At least one IOR_POSIX instance per CN.

Create: each IOR_POSIX instance task may create one file with zero size.

Write: each IOR_POSIX instance may write all data to the file and close the file.

Read: Each IOR_POSIX instance may open and read the data in the file it originally wrote and close and delete the file.

Terminate: IOR_POSIX job will terminate.

Each run of IOR_POSIX may execute steps 2-4 above 4 times. IOR_POSIX prints out the read and write rate for each iteration. The figure of merit for IOR_POSIX function ship test is the minimum of the four read and four write rates. The overall function ship I/O rate (Rp) is defined as



Where N is the number of IOR_POSIX runs completed in the 30 minute test period.


2.9.1.2ION to ION Link RDMA Performance (TR-1)


Offeror provided hardware and software may deliver at least 95% of the SAN interface peak link bidirectional bandwidth (0.95*6.4 GB/s = 6.08 GB/s per IBA 4x QDR interface) to the OFED perf test RDMA bandwidth or equivalent test with the message length and number of messages chosen to maximize delivered bandwidth between two ION. For messages of 1MB length (transfer size use by Lustre), message bandwidth may be at least 90% of the SAN interface peak link bidirectional bandwidth (0.90*6.4 GB/s = 5.75 GB/s).

2.9.1.3ION to Lustre OSS Performance (TR-1)


Offeror provided hardware and software may not inhibit LLNS from utilizing the Lustre LNET self-test from any ION to any LLNS supplied Lustre OSS over the LLNS provided SAN from achieving 85% of the Offeror provided SAN interface peak link bidirectional bandwidth (0.85*6.4 GB/s = 5.45 GB/s). Offeror may work with LLNS to identify any performance bottlenecks or bugs in Offeror provided hardware and software to enable the correct functioning and achieve the performance requirements of this test.

2.9.2LN & SN High-Availability RAID Arrays (TR-1)


All disk resources for the LN and SN local IO may be RAID5 (or better) controller active-active (as opposed to active-passive) fail-over pair and disk arrays. RAID parity may be calculated on reads as well as writes and the RAID parity read in from disk verified against calculated RAID parity on the data read in from disk. The RAID units and disk enclosures may have high availability characteristics. These may include no single point of failure architecture, dual data paths between the RAID controllers and each disk, redundant fail-over power supplies and fans, at least one hot spare disk per eight RAID chains, hot swappable disks, the capability to run in degraded mode (one disk/RAID string failure), and the capability to rebuild a replaced disk on the fly with a delivered raw I/O performance impact of less than 30% on that RAID chain. There may be system diagnostics capable of monitoring the function of the RAID units and detecting disk or other component failure and monitoring read or write soft failures.

2.9.3LN & SN High IOPS RAID (TR-2)


The LN and SN RAID5 (or better) arrays will deliver at least 500 MB/s aggregate large block read/write bandwidth from the Linux EXT3 filesystem mounted on /tmp and /var/tmp on each LN. The RAID5 arrays will deliver at least 640 IOPS to an IO workload randomly reading and writing 4,096B blocks with 50% read and 50% write balance from the Linux EXT3 filesystem mounted on each partition on each LN. Note that the aggregate RAID controller pair and disk arrays performance is 500 MB/s times the number of LN and SN for large block IO and 640 IOPS times the number of LN and SN for 4KiB random IO.

Download 0.66 Mb.

Share with your friends:
1   ...   8   9   10   11   12   13   14   15   ...   34




The database is protected by copyright ©ininet.org 2024
send message

    Main page