Draft statement of work


Integrated System Administration Tools



Download 0.66 Mb.
Page16/34
Date28.01.2017
Size0.66 Mb.
#9693
1   ...   12   13   14   15   16   17   18   19   ...   34

3.5Integrated System Administration Tools



3.5.1Single Point for System Administration (TR-1)


Offeror may provide a set of facilities to administer the Sequoia CN, ION, SN and LN as a single entity. In particular, Offeror may provide fully supported implementation of a single-point system administration tool to effect configuration actions on: file system mounts; node booting; node status; node self-consistency checks of system configuration parameters; software installation; resource administration; node shutdown/restart; system patch installation; login control (provide capability to restrict login access to certain processors, and cluster-wide monitoring of failed login attempts by an individual); and system back-ups, including ability to dump multiple volumes of tapes without operator intervention. This single-point of control may provide a command-line interface that effects one or more actions from each command issued with error return code allowing the system administrator the ability to script (automate) redundant configuration tasks for multiple or all nodes in the system. This command-line interface may be capable of performing all of the above system administration configuration actions. The Offeror may provide a fully supported implementation of mechanisms for detecting and reporting failures of critical resources, including processors, network paths, and disks. The diagnostic routines may be capable of isolating hardware problems down to the FRU level in both the system and its peripheral equipment.

3.5.2System Admin (TR-1)


CN, ION diskless environments may be installed and maintained on the SN. Multiple CN and associated ION environments may be selectable on a per boot basis. Installing CN, ION diskless environments may not require patching source code nor compiling from source code. Offeror provided system administration utilities may allow the boot/reboot of individual or groups of ION and associated CN together or just the ION separately (without rebooting CN). Reboot of BOS and LWK may not be required for normal day-to-day operations (e.g., changing configuration files).

3.5.2.1Fast, Reliable System Reboot (TR-1)


Rebooting the entire system will take less than fifteen (15) minutes and once initiated may not require human intervention. This time will include the time to reboot the nodes, switches, mount any local file systems (if applicable) and return all system daemons to operating condition. This system reboot time specifically does include the time to unmount idle (no pending IOs, no open files or file locks active) remote file systems (including the NFS and Lustre file systems), but does not include the time to mount remote file systems.

3.5.2.2Multi Configuration Boot, Install and Patch (TR-1)


The system may have the ability to boot ten (10) alternate system software release and/or configurations. This may be used to test new system releases in “debug shots” or provide multiple kernels for CN. Switching between these any alternative system software release may be accomplished with a single system reboot and take less than ninety (90) minutes including reboot time (Section 3.5.2.1). It may be possible to patch any system software release and/or configuration. It may be possible to back out any patches applied to any system software release and/or configuration. Installing, upgrading, and patching (applying or backing out) any configuration that is not active may be accomplished with the system on-line and under user workload and may take less than eight (8) hours for installs and upgrades, and two (2) hours for patches. This includes any system reboots.

3.5.3System Debugging and Performance Analysis (TR-2)


Offeror may provide a set of facilities with a single-point of control to analyze the entire system performance and make tuning modifications. In particular, the Offeror may provide fully supported implementation of a single-point of control system tuning tool to dynamically monitor and modify the following system attributes: processor status; key resources: system CPU usage, memory usage, page faults; run queues per node; scheduling priority of each process and each thread within a process; and current system configuration. The tuning parameter changes may take affect without requiring an operating system reboot. This single-point of control may require root access to make modifications, but only normal user privileges to monitor the system. Due to the large number of system attributes and components, this single-point of control may be constructed to be fast and efficient when monitoring and modifying the entire system. All system information and control functions may be presented in a hierarchical fashion.

3.5.4Scalable Centralized Resource Data Base (TR-2)


Offeror may provide an Open Source SQL compliant scalable centralized resource data base (CRDB), keeping track of the state of all system resources, their current usage policies, and a system error log. The schema used by the CRDB for storing the data will also be available as Open Source license. This facility and the system utilities/functions that depend on it, may be constructed so that the CRDB does not become a single point of failure or contention (bottleneck) within the system. In particular, SQL updates to the CRDB and SQL queries from the CRDB during system changes impacting at least 50% of the nodes (e.g., rebooting, major system disruptions) may be done in parallel so as to not impede rapid system transitions. The degree of parallelism supported in the CRDB may be a system tunable parameter.

3.5.5User Maintenance (TR-2)


Offeror may provide a secure (only root access) tool for managing user administration, including some means of integrating the namespace manager and the authentication server in order to facilitate adding, removing, and modifying users. In addition, the Offeror may provide a tool for managing groups, including initial creation of groups, modification of groups, and user membership in groups. Offeror provided user administration tools may allow/disallow user accounts on CN, LN, SN and ION separately. These tools may provide a scriptable interface and may not require human interaction with a GUI to perform any functions.

3.5.6Login Load Balancing Service(TR-2)


In order to balance the user logins across the LN, Offeror may propose hardware and software to route individual user logins to different LN for each successive login attempt. Proposed solution should integrate with LLNS 1/10 GbE infrastructure and allow site-specific policies for choosing load balancing algorithms.

Download 0.66 Mb.

Share with your friends:
1   ...   12   13   14   15   16   17   18   19   ...   34




The database is protected by copyright ©ininet.org 2024
send message

    Main page