Efficient Binary Translation
In Co-Designed Virtual Machines
by
Shiliang Hu
A dissertation submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
(Computer Sciences)
at the
UNIVERSITY OF WISCONSIN – MADISON
2006
Copyright by Shiliang Hu 2006
All Rights Reserved
To my mother, and all people who have been supporting, enlightening me.
Hu, Shiliang
Abstract There is an inherent tension between two basic aspects of computer design: standardized ISAs that allow portable (and enduring) software to be used in a wide variety of systems, and innovative ISAs that can take best advantage of ever-evolving silicon technologies. This tension originates from the ultimate objective of computer architects: efficient computer system designs that (1) support expanding capabilities and higher performance, and (2) reduce costs in both hardware and software. This inherent tension often forces traditional processor designs out of the optimal complexity-effective envelope because a standard ISA defines the hardware/software interface and it cannot be changed without breaking binary compatibility. In this dissertation, I explore a way of transcending the limitations of conventional, standard ISAs in order to provide computer systems that are more nearly optimal in both performance and complexity. The co-designed virtual machine paradigm decouples the traditional ISA hardware/software interface. A dynamic binary translation system maps standard ISA software to an innovative, implementation-specific ISA implemented in hardware. Clearly, one major enabler for such a paradigm is an efficient dynamic binary translation system. This dissertation approaches co-designed VMs by applying the classic approach to computer architecture: employing hardware to implement simple high performance primitives and software to provide flexibility. To provide a specific context for conducting this research, I explore a co-designed virtual machine system that implements the Intel x86 instruction set on a processor that employs the architecture innovation of macro-op execution. A macro-op is formed by fusing a dependent pair of conventional, RISC-like micro-ops. Supported by preliminary simulation results, first I use an analytical model of major VM runtime overheads to explore an overall translation strategy. Second, I discuss efficient software binary translation algorithms that translate and fuse dependent instruction pairs into macro-ops. Third, I propose primitive hardware assists that accelerate critical part(s) of dynamic binary translation. Finally, I outline the design of a complete complexity-effective co-designed x86 processor by integrating the three major VM enabling technologies: a balanced translation strategy, efficient translation software algorithms, and simple, effective hardware primitives. By using systematic analysis and experimental evaluations with a co-designed VM infrastructure, I reach the following conclusions. Dynamic binary translation can be modeled accurately from a memory hierarchy perspective. This modeling leads to an overall balanced translation strategy for an efficient hardware / software co-designed dynamic binary translation system that combines the capability, flexibility, and simplicity of software translation systems with the low runtime overhead of hardware translation systems. Architecture innovations are then enabled. The explored macro-op execution microarchitecture enhances superscalar processors via fused macro-ops. Macro-ops improve processor ILP as well as reduce pipeline complexity and instruction management/communication overhead. The co-designed VM paradigm is very promising for future processors. The outcomes from this research provide further evidence that a co-designed virtual machine not only provides better steady state performance (via enabling novel efficient architecture), but can also demonstrate competitive startup performance to conventional superscalar processor designs. Overall, the VM paradigm provides an efficient solution for future systems that features more capability, higher performance, and lower complexity/cost.
Acknowledgements
This dissertation research would not have been possible without the incredible academic environment at the University of Wisconsin – Madison. The education during the long six and a half years will profoundly change my life, career and perhaps more.
First, I especially thank my advisor, James E. Smith, for advising me through this co-designed x86 virtual machine research, which I enjoyed exploring during the past three or more years. It is our appreciation of the values and promises that has been motivating most of the thinking, findings and infrastructure construction work. I have learned a lot from the lucky opportunity to work with Jim and learn his approach for doing quality research, writing, thinking and evaluating things among many others.
An especially valuable experience for me was to work across two excellent departments, the Computer Sciences and the Electrical and Computer Engineering. Perhaps this was even vital for this hardware/software co-designed virtual machine research. Many research results might not have been possible without a quality background and environment in both areas. I especially appreciate the insights offered by Jim Smith, Charles Fischer, Jim Goodman, Mark Hill, Mikko Lipasti, Thomas Reps, Guri Sohi and David Wood. I remember Mark’s many advices, challenges and insights during seminars and talks. I might have been doing something else if not for Mikko’s architecture classes and priceless mentoring and help afterwards. I appreciate the reliable and convenient computing environment in both the departments.
The excellent Wisconsin Computer Architecture environment also manifests itself in terms of opportunities for peer learning. There are valuable discussions, peer mentoring/tutoring, reading groups, architecture lunch, architecture seminars, beers, conference travels/hanging outs and so on . I especially enjoyed and thank the companies of the Strata group. The group members are: Timothy Heil, S. S. Sastry, Ashutosh Dhodapkar, Ho-Seop Kim, Tejas Karkhanis, Jason Cantin, Kyle Nesbit, Nidhi Aggarwal and Wooseok Chang. In particular, Ho-Seop Kim shared his detailed superscalar microarchitecture timing simulator source code. Ilhyun Kim helped me to develop the microarchitecture design for my thesis research and our collaboration produced an HPCA paper. Wooseok Chang helped to setup the Windows benchmarks and trace collection tools. I learnt a lot about dissertation writing by reading other Ph.D. dissertations from Wisconsin Architecture group, especially Milo Martin’s dissertation.
As a student in the CS area for more than ten years, I especially cherish the collaborations with the more than ten ECE students during those challenging ECE course projects for ECE554, 555, and 755. I learnt a lot and the experience profoundly affected my thesis research.
Prof. Chuan-Qi Zhu and BingYu Zang introduced me to computer system research and the top research teams around the world dating back to the mid-1990’s, at the Parallel Processing Institute, Fudan University. I cherish the intensive mathematics training before my B.S. degree. It helped to improve the way I think and solve problems.
Finally, this research has been financially supported by the following funding sources, NSF grants CCR-0133437, CCR-0311361, CCF-0429854, EIA-0071924, SRC grant 2001-HJ-902, the Intel Corporation and the IBM Corporation. Personally, I appreciate Jim’s constant and generous support. I also thank the Intel Corporation and Microsoft Research for generous multi-year scholarships and internships during my entire undergraduate and graduate career. It may not have reached this milestone without this generous support.
Share with your friends: |