Jeffrey Rodriguez Professor Seaver

Download 120.1 Kb.

Date	31.01.2017
Size	120.1 Kb.
	#13955

HT Result 1 non-HT Result 1 HT Result 2 non-HT Result 2

Hyper Threading
By

Jeffrey Rodriguez

Professor Seaver

CST 123

December 6, 2004

Hyper Threading

Hyper Threading is Intel’s implementation of simultaneous multithreading on Pentium 4 processors. It allows multiple threads to execute at the same time in one processor. Hyper Threading was first announced in the fall of 2001 and made available in early 2002. Since then it has become widely popular on desktop PCs. According to Intel, Hyper Threading increases speed by 30% over an identical processor without it.

Originally codenamed ‘Jackson’, Hyper threading was fist announced at the annual Intel Developer Forum in 2001. Intel was not, however, the first company to develop simultaneous multithreading. In 1999, at the Microprocessor Forum in San Jose CA, Compaq announced it had achieved just that with its EV-8 Alpha processor. Unfortunately, the project was terminated prematurely and the processor was never made available. The technology was brought back and improved when Intel introduced it in their Xeon line of processors in 2002. In November of 2002, Hyper Threading was brought to the desktop PC market. The 3.06-gigahertz (GHz) Pentium 4 was the first of its kind to support hyper threading.

To understand hyper threading you must first understand the basics of how a processor works. A diagram of a very basic CPU can be found in Appendix A. For example, let us use a program that will add 7 and 10, and store the result in the accumulator.

MVI A, 7

ADI 10

HLT

The program is stored in RAM. In hexadecimal code, it looks like 3E, 07, C6, 0A, 76. First, the CPU fetches the first instruction, 3E, and stores it in a data register. The instruction decoder then decodes it. It recognizes it as a move immediate instruction and moves the next value, 07, into the accumulator. Each time a value is fetched from RAM, the program counter is incremented to point to the next instruction to fetch. Next, C6, the next instruction, is fetched. It is recognized as an 'add immediate' instruction. The controller sequence then tells the arithmetic logic unit (ALU) to add the next instruction to whatever is in the accumulator. The next instruction is fetched, 0A, and is added to the 07 that is already in the accumulator, resulting in 11 hexadecimal, or 17 base 10. The next instruction is fetched, 76, which tells the program to stop.

Modern processors are much more complicated and have many more registers that the one used in the above example. A diagram of an Intel Pentium 4 processor can be found in Appendix B. You can see the differences between a basic CPU and

Single Threaded CPU

http://arstechnica.com/paedia/images/figure-1.html

The diagram on the right represents a single threaded processor. The colored boxes in RAM signify different threads waiting to be executed. The ‘front end’ section inside the CPU is where instructions are fetched, decoded, and re-ordered. The ‘Execution Core’ is where the instructions are executed.

With this type of processor, only one thread may be executing at once, represented by the red blocks in the CPU.

Also, notice the empty blocks. These blocks are where the CPU was unable to any useful work, called pipeline bubbles. There are many reasons why this happens, including instructions decoded improperly or threads not ready to be executed. These empty spaces are not recoverable and will remain through the execution of the process.

Single threaded SMP

http://arstechnica.com/paedia/images/figure-2.html
One solution to speed up execution is to have multiple processors. For each processor we have, another thread can execute at the same time. This is called Symmetric Multiprocessing (SMP).

In the diagram above, each CPU can access RAM and is executing a different thread. The biggest problem with this solution is the amount of empty boxes, or pipeline bubbles. While adding more CPUs increases performance, it does not improve efficiency.

To help alleviate the problem of the pipeline bubbles, a CPU must be able to execute more than one thread at once, said to be a multithreaded CPU. One method of doing this is called super threading.

T
Super threaded CPU

http://arstechnica.com/paedia/images/figure-3.html
he diagram on the right illustrates this technique. First, notice that there are fewer pipeline bubbles. Right away this has improves the efficiency of the processor. Also, notice the arrows to the left of the diagram. These arrows emphasize how the processor can mix instructions from different threads. Each processor pipeline can only hold instructions from one thread. The CPU, however, can execute multiple pipelines each clock cycle. This allows multiple threads to execute with each CPU clock cycle.

Hyper Threading, or Simultaneous Multithreading (SMT), takes this idea even further. It allows instructions from threads to be on the same pipeline as one another. This minimizes the amount bubbles and maximizes the CPU efficiency.

Hyper Threaded CPU

http://arstechnica.com/paedia/images/figure-4.html

This is the biggest strength of Hyper Threading. It allows for one CPU to do the same work as two CPUs with greater efficiency. To achieve this, a hyper threaded CPU is divided into two logical CPUs. Each logical CPU has it’s own arcetectural state which includes some general purpose registers, control registers, the program counter, the advanced programmable interrupt controller (APIC), and some machine state registers. Other resources, such as cache, control logic and buses, are shared by the two logical processors. Once the arcetectural state is duplicated, the operating system now sees two processors.

The operating system can schedule processes on both logical processors as if they were two physical processors. This can greatly increase performance, up to thirty percent, according to Intel. Many people have tested hyper threading technology on their own and come to their own conclusions. I, too, have done my own tests.

For the tests, I used my current PC. Complete specifications of the test computer can be found in Appendix C. To perform the tests, I used PCMark 2004 v1.2. First, I restarted the PC and changed the BIOS configuration to disable hyperthreading. The PC then started up. I then stopped all processes that run automatically on startup. This left a total of 23 processes running that are part of Windows XP. I then ran the testing software. The same procedure was used for testing with hyperthreading enabled. Both tests were performed twice on different days.

After the first round of testing, there was an overall improvement of 12.2% with hyper threading enabled. More specifically, there was a 16.8% improvement in the CPU category, according to PCMark. The second round of testing showed even greater results with an overall 13% improvement with hyper threading and 18.5% improvement in the CPU category. Complete results can be found in Appendix D. While 13% is good, it’s clearly not the 30% that Intel claims. Perhaps the biggest performance improvement is when a user multitasks. According to some, increases up to 47% can be seen when running two applications such as a virus scan and video encoder.

Since hyper threading became availaable in 2002, it has become increasingly popular among home PC users. It’s use of effective technology increases performance which benefits the home user the most. Since it was incorporated into the Pentium 4 processors, the product line has grown to include processors from 2.8 GHz up to 3.8 GHz.

Appendix A – Simple CPU

8085 Microprocessor Programming, Textbook.

© 2001 Heathkit Company, Inc., Benton Harbor, Michigan.
Appendix B – Pentium 4

Hinton, Glenn. Dave Sager, Mike Upton, Darrell Boggs, Doug Carmean, Alan Kyker, Patrice Roussel,. “The Microarchitecture of the Pentium® 4 Processor” <http://developer.intel.com/technology/itj/q12001/articles/art_2.htm> Intel.

Appendix C – System Specifications

Central Processing Unit
Manufacturer	Intel
Family	Intel(R) Pentium(R) 4 CPU 3.20GHz
HyperThreadingTechnology	Available - 2 Logical Processors
Motherboard Info
Manufacturer	ASUSTeK Computer Inc.
Model	P4C800-E
Version	Rev 1.xx
BIOS Vendor	American Megatrends Inc.
BIOS Version	A M I - 9000302
Memory Info
Total Physical Memory	5 x 512MB DDR PC3200
Manufacturer	Corsair
Display Device
Description	ASUS A9800XT
Manufacturer	ATI Technologies Inc.
Driver Version	6.14.10.6476
Total Local Video Memory	256 MB
Sound Device
Description	SB Audigy 2 ZS Audio [DF00]
Driver Version	5.12.5.441
Manufacturer	Creative Technology, Ltd.
Hard Disk Drives
IDE	Western Digital 120GB
	Western Digital 80GB
SATA	Western Digital 200GB
Operating System Info
Operating System	Microsoft Windows XP
Version	5.1.2600
Service Pack	Service Pack 2

Appendix D – Benchmark Results

PCMark04 Results

	HT Result 1	non-HT Result 1	HT Result 2	non-HT Result 2
PCMark	4861	4329	4833	4274	PCMarks
CPU	4804.0	4110.0	4704.0	3969.0
Memory	4639.0	4518.0	4556.0	4558.0
Graphics	4430.0	4454.0	4440.0	4406.0
HDD	3851.0	3182.0	3443.0	3428.0
File Compression	5.5	4.1	5.4	4.0	MB/s
File Encryption	51.8	45.6	51.1	44.3	MB/s
File Decompression	38.0	27.1	37.8	27.5	MB/s
Image Processing	14.3	13.2	14.6	13.4	MPixels/s
Virus Scanning	2466.6	1565.8	2729.7	1599.8	MB/s
Grammar Check	2.0	2.2	2.1	2.4	KB/s
File Decryption	91.1	90.8	84.8	81.3	MB/s
Audio Conversion	2827.2	2819.9	2814.0	2814.9	KB/s
Web Page Rendering	5.6	5.5	5.6	5.4	Pages/s
WMV Video Compression	56.2	49.6	52.0	46.4	FPS
DivX Video Compression	63.3	55.2	62.9	51.7	FPS
Physics Calculation and 3D	180.5	173.2	176.0	178.6	FPS
Graphics Memory - 64 lines	2710.1	2712.6	2632.3	2628.7	FPS
File Compression	5.4	4.0	5.4	4.0	MB/s
File Encryption	50.6	44.9	49.9	37.8	MB/s
File Decompression	38.1	27.1	38.3	27.4	MB/s
Image Processing	14.7	13.2	14.4	13.2	MPixels/s
Grammar Check	4.3	4.2	4.3	4.3	KB/s
File Decryption	88.1	88.4	81.8	69.6	MB/s
Audio Conversion	2814.4	2820.8	2816.2	2828.4	KB/s
WMV Video Compression	56.0	49.6	56.1	49.3	FPS
DivX Video Compression	63.4	41.9	57.8	45.0	FPS
Raw Block Read - 8 MB	4759.4	4522.9	4801.0	4793.5	MB/s
Raw Block Read - 4 MB	4500.1	4312.4	4830.2	4771.3	MB/s
Raw Block Read - 192 KB	24224.0	24159.4	22637.9	21438.6	MB/s
Raw Block Read - 4 KB	45175.2	44512.6	45484.5	42864.6	MB/s
Raw Block Write - 8 MB	3986.3	3991.8	3992.6	3996.5	MB/s
Raw Block Write - 4 MB	3985.1	3991.7	4001.7	4002.2	MB/s
Raw Block Write - 192 KB	13867.3	13849.1	13894.0	13893.3	MB/s
Raw Block Write - 4 KB	13816.2	13797.3	13841.8	13794.1	MB/s
Raw Block Copy - 8 MB	1593.6	1476.6	1445.0	1434.2	MB/s
Raw Block Copy - 4 MB	1636.0	1489.2	1478.7	1474.5	MB/s
Raw Block Copy - 192 KB	11955.5	11578.4	11697.7	11571.0	MB/s
Raw Block Copy - 4 KB	13735.7	13667.5	13839.4	13828.9	MB/s
Random Access - 8 MB	2401.4	2449.5	2588.5	2596.4	MB/s
Random Access - 4 MB	2581.3	2480.0	2337.3	2550.5	MB/s
Random Access - 192 KB	8204.8	8098.3	7070.0	7232.5	MB/s
Random Access - 4 KB	12518.9	12502.5	12539.6	12417.6	MB/s
Transparent Windows	1608.6	1615.6	1609.1	1618.8	Windows/s
File Copying	25.4	15.2	19.3	17.7	MB/s

Directory: ~rodriguezjel -> portfolio -> education -> mcc
mcc -> Popular Electronics

Download 120.1 Kb.

Share with your friends: