The ATAC project proposes to build a prototype computer system, including a detailed system simulator, a compiler system, a runtime system, a programming model and associated APIs. These efforts will be driven by models of the optical components that we will develop.
ATAC Architecture The ATAC chip architecture will be developed. This will include the ATAC network hierarchy, the mesh network, and the processor-to-network interface, including support for broadcast, external memory and I/O interfaces. This effort will also define the coding and clocking transmission scheme for the optical network. We will validate our assumptions that we can build message flow control and receiver-side filtering and buffering for the all-to-all broadcast network with reasonable complexity, area, speed, and energy. Selected portions of the processor-to-network interface will be implemented in Verilog, synthesized and prototyped in FPGAs to verify their feasibility.
Optical Interconnect Interfaces and Component Models As research progresses on the novel integrated photonic components, we will be developing and refining models of these components. This includes characteristics such as switching speed, propagation delay, propagation loss, insertion loss, energy consumption, and physical size. Based on these models, we will be designing the interfaces between the digital electronics of the core processors and the optical components of the broadcast interconnect. This will include some analog electrical circuitry as well as additional digital logic to pre-filter, sort, and buffer data moving between the processor pipeline and the optical network. As we gain additional information about the optical components and refine the architectural design, we will update the simulator’s functional and performance models to reflect these changes. Using these models, we will be able to accurately estimate the bit-error-rate, latency, on and off chip range, and footprint of the optical network and the performance, energy consumption, and physical size of an ATAC processor.
Pin-Based ATAC Simulator We will use simulation to evaluate and to refine the ATAC architecture and its broadcast network using parallel applications. In collaboration with Robert Cohn’s Pin group of Intel, we are using the Intel Pin dynamic binary instrumentation infrastructure to develop a massively parallel simulator for fast simulation of generic multicore systems with 1000s of cores. The Pin-based simulator can be used to develop multicore applications, compilers, and operating systems and for rapidly prototyping and evaluating multicore architectural mechanisms such as the ATAC broadcast network. The ATAC simulator will incorporate mechanisms for energy modeling at both the chip level and the system level. We have already successfully created an early version of a multicore pin simulator.
Pin allows one to insert extra code at specific points in the program at run time; the specific points and the code to be inserted at each point can be specified in a separate executable called a “Pintool.” Additionally, Pin allows function calls in the application to be replaced by calls to functions defined within the Pintool. The inserted/replaced code can be used to simulate new features: it can modify processor state, change program behavior or use a performance model to adjust a simulated clock. The simulator uses these features of Pin to implement architectural mechanisms and to model performance.
T
Figure 8: Mapping of simulated cores to threads and physical cores.
he Pin-based simulator has been designed to take advantage of the parallelism of host architectures such as multicores or clusters. It models each core within the simulated system as a separate kernel thread, independently schedulable by the OS (Figure 8). The OS maps the threads to the hardware, enabling the simulator to exploit the available parallelism. Cores (threads) communicate using calls to a simple API which represents the intrinsic capabilities of the simulated architecture. e.g., broadcast, point-to-point message-passing, etc. The simulator replaces API calls within the application with calls to functions defined within the Pintool that implement the corresponding functionality and update the simulation clock of the appropriate cores using a model of the communication cost. The implementation of the API functions within the simulator depends on what communication mechanisms are available on the host architecture. For example, the implementation of inter-core communication may use buffers in shared memory for threads on a single machine, or use sockets over Ethernet for threads running on different machines in a cluster.
We have chosen to base our multicore simulator on Pin because it offers several advantages over creating our own simulator from scratch as in the Raw project. First, the Pin infrastructure is reliable: it is mature, robust, and well-supported. Second, it is high performance: it natively executes application code on the host hardware rather than interpreting it. Third, using Pin shortens our simulator toolchain development time: it allows us to use existing tools for compiling multicore applications (gcc, binutils, etc.) instead of having to develop them ourselves. Its major drawback is that it only allows us to model the x86 processor. We believe this is not a significant issue because in future massive multicores the specifics of the core and ISA become secondary issues to global communications, and memory and I/O systems.
ATAC API/Language Constructs As the ATAC project heavily emphasizes the importance of ease of programming, a significant part of the project will involve the development of programming APIs and high-level language constructs. The goal of the API and language construct development is to enable programmers to quickly implement reasonably complex parallel algorithms on ATAC using straightforward implementations and relatively minimal effort, while still achieving excellent performance. This goal will be achieved in part by having the API and language constructs implement all of the “heavy lifting”. For example, it will handle message management, shared memory coherence protocols, and automatically choosing between the optical broadcast network or the electrical mesh network depending on traffic patterns and current network congestion.
The ATAC Compiler The ATAC compiler will compile programs written using the aforementioned high-level language constructs into assembly code suitable for the ATAC hardware. It will generate the low-level code needed to send and receive on the optical broadcast network and the electrical mesh network. It will also incorporate profile information gathered through an application profiling mechanism into the compilation process.
Applications The ATAC project will include a significant application development effort to help develop and test our new API. The applications to be implemented will include standard benchmark suites (e.g., SPEC, SPLASH [22]), stream-based multimedia codes (e.g., MediaBench II, video encode/decode), and scientific codes (e.g., FFT, N-body simulation). These applications will also be used to assess programmer productivity and architectural performance.
Programmer Productivity We will assess the programmer productivity benefits of ATAC by comparing it to traditional mesh-style multicores. The ATAC ease of use study will use the following three metrics to quantify programming effort:
-
Lines of code, and lines of communication code.
-
Programming time through a user study. Users will code up several simple benchmarks in C for a traditional mesh architecture, and in the ATAC API for the ATAC architecture. Time to first result and time to achieve a given level of performance will be measured.
-
Programming gap. This metric will compare the variance in performance between a quick implementation and an optimized implementation of a benchmark. Easy-to-program architectures will have a smaller variance between the two.
Share with your friends: |