Chapter 1
Computer systems are fundamental to the infrastructure of our society. They are embodied in supercomputers, servers, desktops, laptops, and embedded systems. They power scientific / engineering research and development, communications, business operations, entertainment and a wide variety of electrical and mechanical systems ranging from aircraft to automobiles to home appliances. Clearly, the higher performance and the more capability computers can provide, the more potential applications and convenience we can benefit from. On the other hand, these computing devices often require very high hardware/software complexity. System complexity generally affects costs and reliability; more recently, it particularly affects power consumption and time-to-market. Therefore, architecture innovations that enable efficient system designs to achieve higher performance at lower complexity have always been a primary target for computer architects.
However, the several decades’ history of computer architecture demonstrates that efficient designs are both application-specific and technology-dependent. In this chapter, I first discuss a dilemma that inhibits architecture innovations. Then, we outline a possible solution and the key issues to be addressed to enable such a solution. To better estimate its significance, I briefly position this thesis among the background of many related projects. Finally, we overview the thesis research and summarize the major contributions of the research.
1.1The Dilemma: Legacy Code and Novel Architectures
Computer architects are confronted by two fundamental issues, (1) the ever-expanding and accumulating application of computer systems, and (2) the ever-evolving technologies used for implementing computing devices. A widely accepted task for computer architects is to find the optimal design point(s) for serving existing and future applications with the current hardware technology. Unfortunately, the two fundamental issues are undergoing different trends that are not in harmony with each other.
First, consider the trend for computer applications and software. We observe that for end-users or service consumers the most valuable feature of a computing device is its functional capability. Practically speaking, this capability manifests itself as the available software a computer system can run. As applications expand and accumulate, software is becoming more complex and its development, already known to be a very expensive process, is becoming more expensive. The underlying reasons are (1) computer applications themselves are becoming more complex as they expand; and (2) the conventional approach to architecture defines the hardware/software interface so that hardware implements the performance-critical primitives, and software provides the eventual solution with flexibility. Moreover, even porting a whole body of software from a binary distribution format (i.e. ISA, Instruction Set Architecture) to a new binary format is also a prohibitively daunting task. As computer applications continue to expand, a huge amount of software will accumulate. Then, it is naturally a matter of fact that software developers prefer to write code only for a standard binary distribution format to reduce overall cost. This observation about binary compatibility has been verified by the current trend in the computer industry – billions of dollars have been invested on software for the (few) surviving ISAs.
Next, turn to the other side of the architecture interface, and consider the technologies that architects rely on to implement computing devices. There has been a trend of rapidly improving and evolving technology throughout the entire history of electronic digital computers. Each technology generation provides its specific opportunities at the cost of new design challenges. It has been recognized that advanced approaches for achieving efficient designs (for a new technology generation) often require a new supporting ISA based on awareness of the technology or even dependent on the technology. For example, RISCs [103] were promoted to reduce complexity and enable single-chip pipelined processor cores. VLIW [49] was proposed as a means for further pushing the ILP envelope and reducing hardware complexity. Recently, clustered processors, for example, Multi-cluster [46] and TRIPS [109], were proposed for high performance, low complexity designs in the presence of wire delays [59]. Technology trends continue to present opportunities and challenges: billion-transistor chips will become commonplace, power consumption has become an acute concern, design complexity has become increasingly burdensome and perhaps even the limits of CMOS are being approached. Novel ways of achieving efficient architecture designs continue to be of critical importance.
Clearly, the two trends just described conflict with each other. On one hand, we are accumulating software for legacy ISA(s). On the other hand, in a conventional system, the ISA is the hardware/software interface that cannot be easily changed without breaking binary compatibility. Lack of binary compatibility can be fatal for some new computer designs and can severely constrain design flexibility in others. For example, RISC schemes survive more as microarchitecture designs, requiring complex hardware decoders to match legacy instruction sets such as the x86. Additionally, there is yet no evidence that VLIW can overcome compatibility issues and succeed in general-purpose computing.
Ironically, the wide-spread application of computer systems seems to be at odds with architecture innovations. And this paradox specifically manifests itself as the legacy ISA dilemma that has long been a practical reality and has inhibited modern processor designers from developing new ISA(s).
1.2Answer: The Co-Designed Virtual Machine Paradigm
The legacy ISA dilemma results from the dual role of conventional ISA(s) as being both the software binary distribution format and the interface between software and hardware. Therefore, simply decoupling these two roles leads to a solution.
The binary format ISA used for commercial software distribution is called the architected ISA, for example, the x86 [6~10, 67~69] or PowerPC ISA [66]. The real interface that hardware pipeline implements, called the implementation ISA (or native ISA), is a separate ISA which can be designed with relatively more freedom to realize architecture innovations. Such innovations are keys to realize performance and/or power efficiency advantages. However, this decoupling also introduces the issue of mapping software for the architected ISA to the implementation ISA. This ISA mapping can be performed either by hardware or by software (Figure 1.1).
If the mapping is performed by hardware, then front-end hardware decoders translate legacy instructions one-by-one into implementation ISA instruction(s) that the pipeline backend can execute. For example, all recent high performance x86 processors [37, 51, 58, 74] adopt RISC microarchitecture to reduce pipeline complexity. Complex CISC decoders are employed to decompose (crack) x86 instructions into RISC-style implementation ISA instructions called micro-ops or uops. Although this context-free mapping employs relatively complex circuitry that consumes power every time an x86 instruction is fetched and decoded, the generated code is suboptimal due to inherent redundancy and inefficiency [63, 114] (Figure 1.1 left box). Therefore, as a matter of fact, to map effectively from an architected ISA to an implementation ISA, context-sensitive translation and optimization are needed to perform overall analysis over a larger translation unit, for example a basic block or a superblock [65] composed of multiple basic blocks. This kind of context-sensitive translation appears to be beyond the complexity-effective hardware design envelope.
Figure 1.1 Co-designed virtual machine paradigm
If the mapping is performed by a concealed layer of software that is co-designed with the implementation ISA and the hardware (Figure 1.1 right box), the overall design paradigm is a co-designed virtual machine (VM). The layer of concealed software is the virtual machine monitor (VMM), and it is capable of conducting context-sensitive ISA translation and optimization in a complexity-effective way. This VM design paradigm is exemplified in Transmeta x86 processors [82, 83], IBM DAISY [41] / BOA [3] projects and has an early variation successfully applied in IBM AS/400 systems [12, 17].
However, the co-designed VM paradigm also involves some design tradeoffs. The decoupled implementation ISA of the VM paradigm brings flexibility and freedom for realizing innovative efficient microarchitectures. But it also introduces VMM runtime software overhead for emulating the architected ISA software on the implementation ISA platform. This emulation involves dynamic binary translation and optimization that is a major source of performance overhead.
Share with your friends: |