ARM Processor September 2005
Introduction
The ARM processor core originates within a British computer company called Acorn. In the mid-1980s they were looking for replacement for the 6502 processor used in their BBC computer range, which were widely used in UK schools. None of the 16-bit architectures becoming available at that time met their requirements, so they designed their own 32-bit processor.
Other companies became interested in this processor, including Apple who were looking for a processor for their PDA project (which became the Newton). After much discussion this led to Acorn’s processor design team splitting off from Acorn at the end of 1990 to become Advanced RISC Machines Ltd, now just ARM Ltd.
Thus ARM Ltd now designs the ARM family of RISC processor cores, together with a range of other supporting technologies. One important point about ARM is that it does not fabricate silicon itself, but instead just produces the design.
The ARM processor is a powerful low-cost, efficient, low-power (consumption, that is) RISC processor. Its design was originally for the Archimedes desktop computer, but somewhat ironically numerous factors about its design make it unsuitable for use in a desktop machine (for example, the MMU and cache are the wrong way around). However, many factors about its design make it an exceptional choice for embedded applications. The ARM architecture enjoys the widest choice of embedded operating systems (OS) for system development. OS choice is critical in producing a winning system design that meets the needs of the developer's chosen market. ARM enables choice by partnering with many leading suppliers of embedded OS and development environments. ARM offers a broad range of processor cores to address a wide variety of applications while delivering optimum performance, power consumption and system cost. These cores are designed to meet the needs of three system categories:
Embedded real-time systems
Embedded real-time systems for storage, automotive body and power-train, industrial and networking applications
Application platforms
Devices running open operating systems including Linux, Palm OS, Symbian OS and Windows CE in wireless, consumer entertainment and digital imaging applications
Secure applications
Smart cards, SIM cards and payment terminals
ARM CPU cores cover a wide range of performance and features enabling system designers to create solutions that meet their precise requirements. ARM offers both synthesizable and hard macro products, together with a range of coprocessors and debug facilities.
“ATAP” stands for ARM Technology Access Program. Creates a network of independent design service companies and equips them to deliver ARM-powered designs. Members get access to ARM technology, expertise and support. Members sometimes referred to as “Approved Design Centers”.
Why ARM
The main features of ARM processor that makes it outstanding are :-
Built-in architecture extensions - more efficient processing of algorithms to save CPU overhead, memory and power.
Technologies it uses are
Thumb®2 -Greatly improved code density
DSP - signal process directly in the RISC core
Jazelle® - Java acceleration;
TrustZone™ - Hardware/Software environment for maximum security
Core performance - Through a wide range of functionality and power, parts running from 1MHz to 1 GHz with architectural performance enhancements for media and Java.
Tools of choice – ARM has the widest range of hardware and software tools support of any 32 bit architecture.
Extensive ecosystem of networking ASICs and standard products/ASSPs - more than 125 standard networking devices for quick time-to-market design cycles.
Wide support - ARM is the best supported microprocessor architecture available. A wide range of OS, Middleware and tools support an extensive choice of multimedia codec solutions optimized for ARM processors, are available from the ARM Connected Community
Physical IP - leading edge for high performance systems
Design notes
The ARM instruction set follows the 6502 in concept, but includes a number of features designed to allow the CPU to better pipeline them for execution. In keeping with traditional RISC concepts, this included tuning the commands to execute in well-defined times, typically one cycle. A more interesting addition to the ARM design is the use of a 4-bit condition code on the front of every instruction, meaning that every instruction can be made a conditional.
This cuts down significantly on the space available for, for example, displacements in memory access instructions, but on the other hand it does make it possible to avoid branch instructions when generating code for small if statements. The standard example of this is Euclid’s GCD algorithm:
(This example is in the C programming language)
int gcd(int i, int j)
{
while (i != j)
if (i > j)
i -= j;
else
j -= i;
return i;
}
Expressed in ARM assembly, the loop, with a little rotation, might look something like
b test
loop subgt Ri,Ri,Rj
suble Rj,Rj,Ri
test cmp Ri,Rj
bne loop
which avoids the branches around the then and else clause that one would typically have to emit.
Another unique feature of the instruction set is the ability to fold shifts and rotates into the "data processing" (arithmetic, logical, and register-register move) instructions, so that, for example, the C statement "a += (j << 2);" could be rendered as a single instruction on the ARM, register allocation permitting.
This results in the typical ARM program being denser than what would normally be expected of a RISC processor. This implies that there is less need for load/store operations and that the pipeline is being used more efficiently. Even though the ARM runs at what many would consider to be low speeds, it nevertheless competes quite well with much more complex CPU designs.
The ARM processor also has some features rarely seen on other architectures that are considered RISC, such as PC-relative addressing (indeed, on the ARM the PC is one of its 16 registers) and pre- and post-increment addressing modes.
Another item of note is that the ARM has been around for a while, with the instruction set increasing somewhat over time. Some early ARM processors (prior to ARM7TDMI), for example, have no instruction to load a two-byte quantity, so that, strictly speaking, for them it's not possible to generate code that would behave the way one would expect for C objects of type "volatile short".
There are lots of things which determine the power consumption of a processor. The most influential on transistor level are the supply voltage, clock- speed, number of switching transistors and to a lesser extent the transistor leakage. By lowering supply voltage, the power requirements drop dramatically. The maximum work frequency drops as well, further lowering power. By only powering the parts of the chip that are actually doing some work, you save even more. If you have a simple implementation with shallow pipelines, using low amounts of transistors, you are in an even better position.
The low power consumption is because it has approximately 1/25th of the number of gates of a Pentium. The high performance is because it's designed better than the Pentium. It doesn't have all the excess baggage the Pentium carries around with it to make it backwards-compatible with the 486, 386, 286, 186 and 8086. The 8086 was a CISC design anyway, as are all its successors, whilst the ARM is a RISC design. RISC design is about implementing those instructions that are used frequently and anything else can be synthesized from them. CISC design is about throw in a kitchen sink instruction and anything else you can think of just in case somebody might want to use them. With RISC design you can make certain simplifications that speed things up - you can design the instruction decode using hardwired gates but CISC is so complicated that you have to use microcode which is inherently slower. As far as RISC goes, the ARM has some wrinkles of its own that add to its performance. The ability to place a conditional flag on any instruction and to determine whether instructions can or cannot affect processor flags means that you can often avoid branches which result in instruction stalls or other slowdowns (on processors that don't have this ability then you have to add loads of power-consuming extra logic to try and compensate for branch stalls).
Share with your friends: |