The acronym RISC (pronounced as risk), for reduced instruction set computer, represents a CPU design strategy emphasizing the insight that simplified instructions that "do less" may still provide for higher performance if this simplicity can be utilized to make instructions execute very quickly. Many proposals for a "precise" definition[1] have been attempted, and the term is being slowly replaced by the more descriptive load-store architecture. Well known RISC families include Alpha, ARC, ARM, AVR, MIPS, PA-RISC, Power Architecture (including PowerPC), SuperH, and SPARC.
Being an old idea, some aspects attributed to the first RISC-labeled designs (around 1975) include the observations that the memory restricted compilers of the time were often unable to take advantage of features intended to facilitate coding, and that complex addressing inherently takes many cycles to perform. It was argued that such functions would better be performed by sequences of simpler instructions, if this could yield implementations simple enough to cope with really high frequencies, and small enough to leave room for many registers[2], factoring out slow memory accesses. Uniform, fixed length instructions with arithmetic’s restricted to registers were chosen to ease instruction pipelining in these simple designs, with special load-store instructions accessing memory.
5. Write about the characteristics.
Typical characteristics of RISC
For any given level of general performance, a RISC chip will typically have far fewer transistors dedicated to the core logic which originally allowed designers to increase the size of the register set and increase internal parallelism.
Other features, which are typically found in RISC architectures are:
-
Uniform instruction format, using a single word with the opcode in the same bit positions in every instruction, demanding less decoding;
-
Identical general purpose registers, allowing any register to be used in any context, simplifying compiler design (although normally there are separate floating point registers);
-
Simple addressing modes. Complex addressing performed via sequences of arithmetic and/or load-store operations;
-
Few data types in hardware, some CISCs have byte string instructions, or support complex numbers; this is so far unlikely to be found on a RISC.
Exceptions abound, of course, within both CISC and RISC.
RISC designs are also more likely to feature a Harvard memory model, where the instruction stream and the data stream are conceptually separated; this means that modifying the memory where code is held might not have any effect on the instructions executed by the processor (because the CPU has a separate instruction and data cache), at least until a special synchronization instruction is issued. On the upside, this allows both caches to be accessed simultaneously, which can often improve performance
6. Write about the comparison of RISC and x86 systems.
RISC and x86
However, despite many successes, RISC has made few inroads into the desktop PC and commodity server markets, where Intel's x86 platform remains the dominant processor architecture (Intel is facing increased competition from AMD, but even AMD's processors implement the x86 platform, or a 64-bit superset known as x86-64). There are three main reasons for this.
-
The very large base of proprietary PC applications are written for x86, whereas no RISC platform has a similar installed base, and this meant PC users were locked into the x86.
-
Although RISC was indeed able to scale up in performance quite quickly and cheaply, Intel took advantage of its large market by spending vast amounts of money on processor development. Intel could spend many times as much as any RISC manufacturer on improving low level design and manufacturing. The same could not be said about smaller firms like Cyrix and NexGen, but they realized that they could apply pipelined design philosophies and practices to the x86-architecture — either directly as in the 6x86 and MII series, or indirectly (via extra decoding stages) as in Nx586 and AMD K5.
-
Later, more powerful processors such as Intel P6 and AMD K6 had similar RISC-like units that executed a stream of micro-operations generated from decoding stages that split most x86 instructions into several pieces. Today, these principles have been further refined and are used by modern x86 processors such as Intel Core 2 and AMD K8. The first available chip deploying such techniques was the NexGen Nx586, released in 1994 (while the AMD K5 was severely delayed and released in 1995).
While early RISC designs were significantly different than contemporary CISC designs, by 2000 the highest performing CPUs in the RISC line were almost indistinguishable from the highest performing CPUs in the CISC line.[12][13][14]
PART-C
UNIT—5
ADVANCED SYSTEM ARCHITECTURE.
1. Compare the difference about the RISC AND CISC.
The simplest way to examine the advantages and disadvantages of RISC architecture is by contrasting it with it's predecessor: CISC (Complex Instruction Set Computers) architecture.
Multiplying Two Numbers in Memory
On the right is a diagram representing the storage scheme for a generic computer. The main memory is divided into locations numbered from (row) 1: (column) 1 to (row) 6: (column) 4. The execution unit is responsible for carrying out all computations. However, the execution unit can only operate on data that has been loaded into one of the six registers (A, B, C, D, E, or F). Let's say we want to find the product of two numbers - one stored in location 2:3 and another stored in location 5:2 - and then store the product back in the location 2:3.
The CISC Approach
The primary goal of CISC architecture is to complete a task in as few lines of assembly as possible. This is achieved by building processor hardware that is capable of understanding and executing a series of operations. For this particular task, a CISC processor would come prepared with a specific instruction (we'll call it "MULT"). When executed, this instruction loads the two values into separate registers, multiplies the operands in the execution unit, and then stores the product in the appropriate register. Thus, the entire task of multiplying two numbers can be completed with one instruction:
MULT 2:3, 5:2
MULT is what is known as a "complex instruction." It operates directly on the computer's memory banks and does not require the programmer to explicitly call any loading or storing functions. It closely resembles a command in a higher level language. For instance, if we let "a" represent the value of 2:3 and "b" represent the value of 5:2, then this command is identical to the C statement "a = a * b."
One of the primary advantages of this system is that the compiler has to do very little work to translate a high-level language statement into assembly. Because the length of the code is relatively short, very little RAM is required to store instructions. The emphasis is put on building complex instructions directly into the hardware.
The RISC Approach
RISC processors only use simple instructions that can be executed within one clock cycle. Thus, the "MULT" command described above could be divided into three separate commands: "LOAD," which moves data from the memory bank to a register, "PROD," which finds the product of two operands located within the registers, and "STORE," which moves data from a register to the memory banks. In order to perform the exact series of steps described in the CISC approach, a programmer would need to code four lines of assembly:
LOAD A, 2:3
LOAD B, 5:2
PROD A, B
STORE 2:3, A
At first, this may seem like a much less efficient way of completing the operation. Because there are more lines of code, more RAM is needed to store the assembly level instructions. The compiler must also perform more work to convert a high-level language statement into code of this form.
CISC
|
RISC
|
Emphasis on hardware
|
Emphasis on software
|
Includes multi-clock
complex instructions
|
Single-clock,
reduced instruction only
|
Memory-to-memory:
"LOAD" and "STORE"
incorporated in instructions
|
Register to register:
"LOAD" and "STORE"
are independent instructions
|
Small code sizes,
high cycles per second
|
Low cycles per second,
large code sizes
|
Transistors used for storing
complex instructions
|
Spends more transistors
on memory registers
|
However, the RISC strategy also brings some very important advantages. Because each instruction requires only one clock cycle to execute, the entire program will execute in approximately the same amount of time as the multi-cycle "MULT" command. These RISC "reduced instructions" require less transistors of hardware space than the complex instructions, leaving more room for general purpose registers. Because all of the instructions execute in a uniform amount of time (i.e. one clock), pipelining is possible.
Separating the "LOAD" and "STORE" instructions actually reduces the amount of work that the computer must perform. After a CISC-style "MULT" command is executed, the processor automatically erases the registers. If one of the operands needs to be used for another computation, the processor must re-load the data from the memory bank into a register. In RISC, the operand will remain in the register until another value is loaded in its place.
The Performance Equation
The following equation is commonly used for expressing a computer's performance ability:
The CISC approach attempts to minimize the number of instructions per program, sacrificing the number of cycles per instruction. RISC does the opposite, reducing the cycles per instruction at the cost of the number of instructions per program.
RISC Roadblocks
Despite the advantages of RISC based processing, RISC chips took over a decade to gain a foothold in the commercial world. This was largely due to a lack of software support.
Although Apple's Power Macintosh line featured RISC-based chips and Windows NT was RISC compatible, Windows 3.1 and Windows 95 were designed with CISC processors in mind. Many companies were unwilling to take a chance with the emerging RISC technology. Without commercial interest, processor developers were unable to manufacture RISC chips in large enough volumes to make their price competitive.
Another major setback was the presence of Intel. Although their CISC chips were becoming increasingly unwieldy and difficult to develop, Intel had the resources to plow through development and produce powerful processors. Although RISC chips might surpass Intel's efforts in specific areas, the differences were not great enough to persuade buyers to change technologies.
The Overall RISC Advantage
Today, the Intel x86 is arguable the only chip which retains CISC architecture. This is primarily due to advancements in other areas of computer technology. The price of RAM has decreased dramatically. In 1977, 1MB of DRAM cost about $5,000. By 1994, the same amount of memory cost only $6 (when adjusted for inflation). Compiler technology has also become more sophisticated, so that the RISC use of RAM and emphasis on software has become ideal.
|
2. Write about the advantage of complier complexity.
THE ADVANTAGE OF
COMPILER COMPLEXITY
OVER HARDWARE
COMPLEXITY
While a VLIW architecture
reduces hardware complexity
over a superscalar
implementation, a much more
complex compiler is required.
Extracting maximum performance
from a superscalar RISC or CISC
implementation does require
sophisticated compiler techniques, but the level of sophistication in a VLIW compiler is significantly higher.
VLIW simply moves complexity from hardware into software. Luckily, this trade-off has a significant side
benefit: the complexity is paid for only once, when the compiler is written instead of every time a chip is
fabricated. Among the possible benefits is a smaller chip, which leads to increased profits for the
microprocessor vendor and/or cheaper prices for the customers that use the microprocessors. Complexity
is usually easier to deal with in a software design than in a hardware design. Thus, the chip may cost less to
design, be quicker to design, and may require less debugging, all of which are factors that can make the
design cheaper. Also, improvements to the compiler can be made after chips have been fabricated;
improvements to superscalar dispatch hardware require changes to the microprocessor, which naturally
incurs all the expenses of turning a chip design.
PRACTICAL VLIW ARCHITECTURES AND IMPLEMENTATIONS
The simplest VLIW instruction format encodes an operation for every execution unit in the machine. This
makes sense under the assumption that every instruction will always have something useful for every
execution unit to do. Unfortunately, despite the best efforts of the best compiler algorithms, it is typically
not possible to pack every instruction with work for all execution units. Also, in a VLIW machine that has
both integer and floating-point execution units, the best compiler would not be able to keep the floatingpoint
units busy during the execution of an integer-only application.
FIGURE 4
Philips Semiconductors
Introduction to VLIW Computer Architecture
10
The problem with instructions that do not make full use of all execution units is that they waste precious
processor resources: instruction memory space, instruction cache space, and bus bandwidth.
There are at least two solutions to reducing the waste of resources due to sparse instructions. First,
instructions can be compressed with a more highly-encoded representation. Any number of techniques,
such as Huffman encoding to allocate the fewest bits to the most frequently used operations, can be used.
Second, it is possible to define an instruction word that encodes fewer operations than the number of
available execution units. Imagine a VLIW machine with ten execution units but an instruction word that can
describe only five operations. In this scheme, a unit number is encoded along with the operation; the unit
number specifies to which execution unit the operation should be sent. The benefit is better utilization of
resources. A potential problem is that the shorter instruction prohibits the machine from issuing the
maximum possible number of operations at any one time. To prevent this problem from limiting
performance, the size of the instruction word can be tuned based on analysis of simulations of program
behavior.
Of course, it is completely reasonable to combine these two techniques: use compression on shorter-thanmaximum-
length instructions.
3. Write about the historical perspective of VLIW.
HISTORICAL PERSPECTIVE
VLIW is not a new computer architecture. Horizontal microcode, a processor implementation technique in
use for decades, defines a specialized, low-level VLIW architecture. This low-level architecture runs a
microprogram that interprets (emulates) a higher-level (user-visible) instruction set. The VLIW nature of the
horizontal microinstructions is used to attain a high-performance interpretation of the high-level instruction
set by executing several low-level steps concurrently. Each horizontal microcode instruction encodes many
irregular, specialized operations that are directed at primitive logic blocks inside a processor. From the
outside, the horizontally microcoded processor appears to be directly running the emulated instruction set.
In the 1980s, a few small companies attempted to commercialize VLIW architectures in the general-purpose
market. Unfortunately, they were ultimately unsuccessful. Multiflow is the most well known. Multiflow’s
founders were academicians who did pioneering, fundamental research into VLIW compilation techniques.
Multiflow’s computers worked, but the company was probably about a decade ahead of its time. The
Multiflow machines, built from discrete parts, could not keep pace with the rapid advances in single-chip
microprocessors. Using today’s technology, they would have a better chance at being competitive.
In the early 1990s, Intel introduced the i860 RISC microprocessor. This simple chip had two modes of
operation: a scalar mode and a VLIW mode. In the VLIW mode, the processor always fetched two
instructions and assumed that one was an integer instruction and the other floating-point. A single program
could switch (somewhat painfully) between the scalar and VLIW modes, thus implementing a crude form of
code compression. Ultimately, the i860 failed in the market. The chip was positioned to compete with other
general-purpose microprocessors for desktop computers, but it had compilers of insufficient quality to
satisfy the needs of this market.
Philips Semiconductors
Share with your friends: |