University of wisconsin madison

Download 0.61 Mb.

Page	2/29
Date	13.05.2017
Size	0.61 Mb.
	#17847

1 2 3 4 5 6 7 8 9 ... 29

List of Tables
List of Figures

Abstract ii

Introduction 1

1.1The Dilemma: Legacy Code and Novel Architectures 2

1.2Answer: The Co-Designed Virtual Machine Paradigm 4

1.3Enabling Technology: Efficient Dynamic Binary Translation 6

1.4Prior Work on Co-Designed VMs 10

1.5Overview of the Thesis Research 12

The x86vm Experimental Infrastructure 15

2.1The x86vm Framework 16

2.2Evaluation Methodology 23

2.3x86 Instruction Characterization 25

2.4Overview of the Baseline x86vm Design 30

2.4.1Fusible Implementation ISA 31

2.4.2Co-Designed VM Software: the VMM 33

2.4.3Macro-Op Execution Microarchitecture 34

2.5Related Work on x86 Simulation and Emulation 37

Modeling Dynamic Binary Translation Systems 40

3.1Model Assumptions and Notation 41

3.2Performance Dynamics of Translation-Based VM Systems 43

3.3Performance Modeling and Strategy for Staged Translation 48

3.4Evaluation of the Translation Modeling and Strategy 53

3.5Related Work on DBT Modeling and Strategy 58

Efficient Dynamic Binary Translation Software 60

4.1Translation Procedure 61

4.2Superblock Formation 62

4.3State Mapping and Register Allocation for Immediate Values 63

4.4Macro-Op Fusing Algorithm 65

4.5Code Scheduling: Grouping Dependent Instruction Pairs 72

4.6Simple Emulation: Basic Block Translation 75

4.7Evaluation of Dynamic Binary Translation 76

4.8Related Work on Binary Translation Software 88

Hardware Accelerators for x86 Binary Translation 94

5.1Dual-mode x86 Decoder 94

5.2A Decoder Functional Unit 98

5.3Hardware Assists for Hotspot Profiling 103

5.4Evaluation of Hardware Assists for Translation 105

5.5Related Work on Hardware Assists for DBT 113

Putting It All Together: A Co-Designed x86 VM 116

6.1Processor Architecture 117

6.2Microarchitecture Details 120

6.2.1Pipeline Frond-End: Macro-Op Formation 121

6.2.2Pipeline Back-End: Macro-Op Execution 124

6.3Evaluation of the Co-Designed x86 processor 129

6.4Related Work on CISC (x86) Processor Design 142

Conclusions and Future Directions 149

7.1Research Summary and Conclusions 150

7.2Future Research Directions 153

7.3Reflections 157

Bibliography 162

List of Tables

Table 2.1 Benchmark Descriptions 24

Table 2.2 CISC (x86) application characterization 26

Table 3.1 Benchmark Characterization: miss events per million x86 instructions 56

Table 4.2 Comparison of Dynamic Binary Translation Systems 91

Table 5.3 Hardware Accelerator: XLTx86 99

Table 5.4 VM Startup Performance Simulation Configurations 106

Table 6.5 Microarchitecture Configurations 130

Table 6.6 Comparison of Co-Designed Virtual Machines 146

List of Figures

Figure 1.1 Co-designed virtual machine paradigm 5

Figure 1.2 Relative performance timeline for VM components 8

Figure 2.3 The x86vm Framework 17

Figure 2.4 Staged Emulation in a Co-Designed VM 21

Figure 2.5 Dynamic x86 instruction length distribution 29

Figure 2.6 Fusible ISA instruction formats 31

Figure 2.7 The macro-op execution microarchitecture 36

Figure 3.8 VM startup performance compared with a conventional x86 processor 47

Figure 3.9 Winstone2004 instruction execution frequency profile 50

Figure 3.10 BBT and SBT overhead via simulation 53

Figure 3.11 VM performance trend versus hot threshold settings 54

Figure 4.12 Two-pass fusing algorithm in pseudo code 67

Figure 4.13 Dependence Cycle Detection for Fusing Macro-ops 69

Figure 4.14 An example to illustrate the two-pass fusing algorithm 70

Figure 4.15 Code scheduling algorithm for grouping dependent instruction pairs 73

Figure 4.16 Macro-op Fusing Profile 78

Figure 4.17 Fusing Candidate Pairs Profile (Number of Source Operands) 80

Figure 4.18 Fused Macro-ops Profile 82

Figure 4.19 Macro-op Fusing Distance Profile 84

Figure 4.20 BBT Translation Overhead Breakdown 86

Figure 4.21 Hotspot (SBT) Translation Overhead Breakdown 87

Figure 5.22 Dual mode x86 decoder 96

Figure 5.23 Dual mode x86 decoders in a superscalar pipeline 97

Figure 5.24 HW accelerated basic block translator kernel loop 99

Figure 5.25 Hardware Accelerator microarchitecture design 102

Figure 5.26 Startup performance: Co-Designed x86 VMs compared w/ Superscalar 108

Figure 5.27 Breakeven points for individual benchmarks 108

Figure 5.28 BBT translation overhead and emulation cycle time 110

Figure 5.29 Activity of hardware assists over the simulation time 113

Figure 6.30 HW/SW Co-designed DBT Complexity/Overhead Trade-off 118

Figure 6.31 Macro-op execution pipeline modes: x86-mode and macro-op mode 120

Figure 6.32 The front-end of the macro-op execution pipeline 121

Figure 6.33 Datapath for Macro-op Execution (3-wide) 126

Figure 6.34 Resource requirements and execution timing 128

Figure 6.35 IPC performance comparison (SPEC2000 integer) 132

Figure 6.36 IPC performance comparison (WinStone2004) 134

Figure 6.37 Contributing factors for IPC improvement 137

Figure 6.38 Code cache footprint of the co-designed x86 processors 141

Download 0.61 Mb.

Share with your friends:

1 2 3 4 5 6 7 8 9 ... 29

University of wisconsin madison

Contents

List of Tables

List of Figures