Hyper Threading
By
Jeffrey Rodriguez
Professor Seaver
CST 123
December 6, 2004
Hyper Threading
Hyper Threading is Intel’s implementation of simultaneous multithreading on Pentium 4 processors. It allows multiple threads to execute at the same time in one processor. Hyper Threading was first announced in the fall of 2001 and made available in early 2002. Since then it has become widely popular on desktop PCs. According to Intel, Hyper Threading increases speed by 30% over an identical processor without it.
Originally codenamed ‘Jackson’, Hyper threading was fist announced at the annual Intel Developer Forum in 2001. Intel was not, however, the first company to develop simultaneous multithreading. In 1999, at the Microprocessor Forum in San Jose CA, Compaq announced it had achieved just that with its EV-8 Alpha processor. Unfortunately, the project was terminated prematurely and the processor was never made available. The technology was brought back and improved when Intel introduced it in their Xeon line of processors in 2002. In November of 2002, Hyper Threading was brought to the desktop PC market. The 3.06-gigahertz (GHz) Pentium 4 was the first of its kind to support hyper threading.
To understand hyper threading you must first understand the basics of how a processor works. A diagram of a very basic CPU can be found in Appendix A. For example, let us use a program that will add 7 and 10, and store the result in the accumulator.
MVI A, 7
ADI 10
HLT
The program is stored in RAM. In hexadecimal code, it looks like 3E, 07, C6, 0A, 76. First, the CPU fetches the first instruction, 3E, and stores it in a data register. The instruction decoder then decodes it. It recognizes it as a move immediate instruction and moves the next value, 07, into the accumulator. Each time a value is fetched from RAM, the program counter is incremented to point to the next instruction to fetch. Next, C6, the next instruction, is fetched. It is recognized as an 'add immediate' instruction. The controller sequence then tells the arithmetic logic unit (ALU) to add the next instruction to whatever is in the accumulator. The next instruction is fetched, 0A, and is added to the 07 that is already in the accumulator, resulting in 11 hexadecimal, or 17 base 10. The next instruction is fetched, 76, which tells the program to stop.
Modern processors are much more complicated and have many more registers that the one used in the above example. A diagram of an Intel Pentium 4 processor can be found in Appendix B. You can see the differences between a basic CPU and
Single Threaded CPU
http://arstechnica.com/paedia/images/figure-1.html
The diagram on the right represents a single threaded processor. The colored boxes in RAM signify different threads waiting to be executed. The ‘front end’ section inside the CPU is where instructions are fetched, decoded, and re-ordered. The ‘Execution Core’ is where the instructions are executed.
With this type of processor, only one thread may be executing at once, represented by the red blocks in the CPU.
Also, notice the empty blocks. These blocks are where the CPU was unable to any useful work, called pipeline bubbles. There are many reasons why this happens, including instructions decoded improperly or threads not ready to be executed. These empty spaces are not recoverable and will remain through the execution of the process.
Single threaded SMP
http://arstechnica.com/paedia/images/figure-2.html
One solution to speed up execution is to have multiple processors. For each processor we have, another thread can execute at the same time. This is called Symmetric Multiprocessing (SMP).
In the diagram above, each CPU can access RAM and is executing a different thread. The biggest problem with this solution is the amount of empty boxes, or pipeline bubbles. While adding more CPUs increases performance, it does not improve efficiency.
To help alleviate the problem of the pipeline bubbles, a CPU must be able to execute more than one thread at once, said to be a multithreaded CPU. One method of doing this is called super threading.
T
Super threaded CPU
http://arstechnica.com/paedia/images/figure-3.html
he diagram on the right illustrates this technique. First, notice that there are fewer pipeline bubbles. Right away this has improves the efficiency of the processor. Also, notice the arrows to the left of the diagram. These arrows emphasize how the processor can mix instructions from different threads. Each processor pipeline can only hold instructions from one thread. The CPU, however, can execute multiple pipelines each clock cycle. This allows multiple threads to execute with each CPU clock cycle.
Hyper Threading, or Simultaneous Multithreading (SMT), takes this idea even further. It allows instructions from threads to be on the same pipeline as one another. This minimizes the amount bubbles and maximizes the CPU efficiency.
Hyper Threaded CPU
http://arstechnica.com/paedia/images/figure-4.html
This is the biggest strength of Hyper Threading. It allows for one CPU to do the same work as two CPUs with greater efficiency. To achieve this, a hyper threaded CPU is divided into two logical CPUs. Each logical CPU has it’s own arcetectural state which includes some general purpose registers, control registers, the program counter, the advanced programmable interrupt controller (APIC), and some machine state registers. Other resources, such as cache, control logic and buses, are shared by the two logical processors. Once the arcetectural state is duplicated, the operating system now sees two processors.
The operating system can schedule processes on both logical processors as if they were two physical processors. This can greatly increase performance, up to thirty percent, according to Intel. Many people have tested hyper threading technology on their own and come to their own conclusions. I, too, have done my own tests.
For the tests, I used my current PC. Complete specifications of the test computer can be found in Appendix C. To perform the tests, I used PCMark 2004 v1.2. First, I restarted the PC and changed the BIOS configuration to disable hyperthreading. The PC then started up. I then stopped all processes that run automatically on startup. This left a total of 23 processes running that are part of Windows XP. I then ran the testing software. The same procedure was used for testing with hyperthreading enabled. Both tests were performed twice on different days.
After the first round of testing, there was an overall improvement of 12.2% with hyper threading enabled. More specifically, there was a 16.8% improvement in the CPU category, according to PCMark. The second round of testing showed even greater results with an overall 13% improvement with hyper threading and 18.5% improvement in the CPU category. Complete results can be found in Appendix D. While 13% is good, it’s clearly not the 30% that Intel claims. Perhaps the biggest performance improvement is when a user multitasks. According to some, increases up to 47% can be seen when running two applications such as a virus scan and video encoder.
Since hyper threading became availaable in 2002, it has become increasingly popular among home PC users. It’s use of effective technology increases performance which benefits the home user the most. Since it was incorporated into the Pentium 4 processors, the product line has grown to include processors from 2.8 GHz up to 3.8 GHz.
Appendix A – Simple CPU
8085 Microprocessor Programming, Textbook.
© 2001 Heathkit Company, Inc., Benton Harbor, Michigan.
Appendix B – Pentium 4
Hinton, Glenn. Dave Sager, Mike Upton, Darrell Boggs, Doug Carmean, Alan Kyker, Patrice Roussel,. “The Microarchitecture of the Pentium® 4 Processor” <http://developer.intel.com/technology/itj/q12001/articles/art_2.htm> Intel.
Appendix C – System Specifications
Central Processing Unit
|
Manufacturer
|
Intel
|
Family
|
Intel(R) Pentium(R) 4 CPU 3.20GHz
|
HyperThreadingTechnology
|
Available - 2 Logical Processors
|
Motherboard Info
|
Manufacturer
|
ASUSTeK Computer Inc.
|
Model
|
P4C800-E
|
Version
|
Rev 1.xx
|
BIOS Vendor
|
American Megatrends Inc.
|
BIOS Version
|
A M I - 9000302
|
Memory Info
|
Total Physical Memory
|
5 x 512MB DDR PC3200
|
Manufacturer
|
Corsair
|
Display Device
|
Description
|
ASUS A9800XT
|
Manufacturer
|
ATI Technologies Inc.
|
Driver Version
|
6.14.10.6476
|
Total Local Video Memory
|
256 MB
|
Sound Device
|
Description
|
SB Audigy 2 ZS Audio [DF00]
|
Driver Version
|
5.12.5.441
|
Manufacturer
|
Creative Technology, Ltd.
|
Hard Disk Drives
|
IDE
|
Western Digital 120GB
|
|
Western Digital 80GB
|
SATA
|
Western Digital 200GB
|
Operating System Info
|
Operating System
|
Microsoft Windows XP
|
Version
|
5.1.2600
|
Service Pack
|
Service Pack 2
|
Appendix D – Benchmark Results
PCMark04 Results
|
HT Result 1
|
non-HT Result 1
|
HT Result 2
|
non-HT Result 2
|
|
PCMark
|
4861
|
4329
|
4833
|
4274
|
PCMarks
|
CPU
|
4804.0
|
4110.0
|
4704.0
|
3969.0
|
|
Memory
|
4639.0
|
4518.0
|
4556.0
|
4558.0
|
|
Graphics
|
4430.0
|
4454.0
|
4440.0
|
4406.0
|
|
HDD
|
3851.0
|
3182.0
|
3443.0
|
3428.0
|
|
File Compression
|
5.5
|
4.1
|
5.4
|
4.0
|
MB/s
|
File Encryption
|
51.8
|
45.6
|
51.1
|
44.3
|
MB/s
|
File Decompression
|
38.0
|
27.1
|
37.8
|
27.5
|
MB/s
|
Image Processing
|
14.3
|
13.2
|
14.6
|
13.4
|
MPixels/s
|
Virus Scanning
|
2466.6
|
1565.8
|
2729.7
|
1599.8
|
MB/s
|
Grammar Check
|
2.0
|
2.2
|
2.1
|
2.4
|
KB/s
|
File Decryption
|
91.1
|
90.8
|
84.8
|
81.3
|
MB/s
|
Audio Conversion
|
2827.2
|
2819.9
|
2814.0
|
2814.9
|
KB/s
|
Web Page Rendering
|
5.6
|
5.5
|
5.6
|
5.4
|
Pages/s
|
WMV Video Compression
|
56.2
|
49.6
|
52.0
|
46.4
|
FPS
|
DivX Video Compression
|
63.3
|
55.2
|
62.9
|
51.7
|
FPS
|
Physics Calculation and 3D
|
180.5
|
173.2
|
176.0
|
178.6
|
FPS
|
Graphics Memory - 64 lines
|
2710.1
|
2712.6
|
2632.3
|
2628.7
|
FPS
|
File Compression
|
5.4
|
4.0
|
5.4
|
4.0
|
MB/s
|
File Encryption
|
50.6
|
44.9
|
49.9
|
37.8
|
MB/s
|
File Decompression
|
38.1
|
27.1
|
38.3
|
27.4
|
MB/s
|
Image Processing
|
14.7
|
13.2
|
14.4
|
13.2
|
MPixels/s
|
Grammar Check
|
4.3
|
4.2
|
4.3
|
4.3
|
KB/s
|
File Decryption
|
88.1
|
88.4
|
81.8
|
69.6
|
MB/s
|
Audio Conversion
|
2814.4
|
2820.8
|
2816.2
|
2828.4
|
KB/s
|
WMV Video Compression
|
56.0
|
49.6
|
56.1
|
49.3
|
FPS
|
DivX Video Compression
|
63.4
|
41.9
|
57.8
|
45.0
|
FPS
|
Raw Block Read - 8 MB
|
4759.4
|
4522.9
|
4801.0
|
4793.5
|
MB/s
|
Raw Block Read - 4 MB
|
4500.1
|
4312.4
|
4830.2
|
4771.3
|
MB/s
|
Raw Block Read - 192 KB
|
24224.0
|
24159.4
|
22637.9
|
21438.6
|
MB/s
|
Raw Block Read - 4 KB
|
45175.2
|
44512.6
|
45484.5
|
42864.6
|
MB/s
|
Raw Block Write - 8 MB
|
3986.3
|
3991.8
|
3992.6
|
3996.5
|
MB/s
|
Raw Block Write - 4 MB
|
3985.1
|
3991.7
|
4001.7
|
4002.2
|
MB/s
|
Raw Block Write - 192 KB
|
13867.3
|
13849.1
|
13894.0
|
13893.3
|
MB/s
|
Raw Block Write - 4 KB
|
13816.2
|
13797.3
|
13841.8
|
13794.1
|
MB/s
|
Raw Block Copy - 8 MB
|
1593.6
|
1476.6
|
1445.0
|
1434.2
|
MB/s
|
Raw Block Copy - 4 MB
|
1636.0
|
1489.2
|
1478.7
|
1474.5
|
MB/s
|
Raw Block Copy - 192 KB
|
11955.5
|
11578.4
|
11697.7
|
11571.0
|
MB/s
|
Raw Block Copy - 4 KB
|
13735.7
|
13667.5
|
13839.4
|
13828.9
|
MB/s
|
Random Access - 8 MB
|
2401.4
|
2449.5
|
2588.5
|
2596.4
|
MB/s
|
Random Access - 4 MB
|
2581.3
|
2480.0
|
2337.3
|
2550.5
|
MB/s
|
Random Access - 192 KB
|
8204.8
|
8098.3
|
7070.0
|
7232.5
|
MB/s
|
Random Access - 4 KB
|
12518.9
|
12502.5
|
12539.6
|
12417.6
|
MB/s
|
Transparent Windows
|
1608.6
|
1615.6
|
1609.1
|
1618.8
|
Windows/s
|
File Copying
|
25.4
|
15.2
|
19.3
|
17.7
|
MB/s
|
Share with your friends: |