The Essentials of Computer Organization and Architecture—Linda Null and Julia Lobur
Contents:
Chapter 1—Introduction
1. Overview
Complexity emanates from concepts that are fundamentally very simple. These simple ideas are the ones that have brought us to where we are today and are the foundation for the computers of the future.
Some things are infeasible today (computationally infeasible) could be feasible tomorrow.
Computer scientists are usually more concerned with writing complex program algorithms than with designing computer hardware.
Program optimization and system tuning are perhaps the most important motivations for learning how computers work.
All computer professionals should be familiar with the concepts of benchmarking and be able to interpret and present the results of benchmarking systems.
Technical managers in charge of buying hardware also use benchmarks to help them buy the best system for a given amount of money, keeping in mind the ways in which performance benchmarks can be manipulated to imply results favorable to particular systems.
Computer Organization addresses issues such as control signals, signaling methods and memory types. It encompasses all physical aspects of computer systems.
Computer Architecture focuses on the structure and behavior of the computer system and refers to the logical aspects of system implementation as seen by the programmer. Computer architecture includes many elements such as instruction sets and formats, operation codes, data types, the number of types of registers, addressing modes, main memory access methods and various I/O mechanisms.
The computer architecture for a given machine is the combination of its hardware components plus its instruction set architecture (ISA). ISA is the agreedupon interface between all the software that runs on the machine and the hardware that executes it.
Computer architecture and Computer organization are interrelated and interdependent. Each individual cannot stand alone.
2. The Main Components of a Computer
Principle of Equivalent of Hardware and Software: Anything that can be done with software can also be done with hardware, and anything that can be done with hardware can also be done with software.
High Level LanguagesAssembly Languages (Instruction set)Machine Languages (CMOS)
At the most basic level, a computer is a device consisting of three pieces:
a) A processor to interpret and execute programs
b) A memory to store both data and programs
c) A mechanism for transferring data to and from the outside world
3. An Example System: Wading Through the Jargon
PICTURES: A typical computer advertisement & Common prefixes associated with computer organization and architecture
1KB. Base of 2=1Kibi Bytes=1024=2^{10} Bytes. Base of 10=1Kilo Bytes=1000=10^{3} Bytes.
The number of instructions per second that a microprocessor can actually execute is proportionate to its clock speed.
“400MHz 256MB DDR SDRAM” = speed of system bus + memory capacity + double data rate + synchronous dynamic random access memory
PICTURES: A Look Inside a Computer
“32 KB L1 cache”=a type of memory to provide even faster access to data from memory to the processor
serial ATA (advanced technology attachment) disk interface (7200 RPM—fast for HDD, usually millisecond is good)IDE=integrated drive electronic; EIDE=enhanced integrated drive electronics=costeffective hardware interface alternative for mass storage devices.
Ports (serial/parallel) allow movement of data to and from devices external to the computer. The system bus is responsible for all data movement internal to the computer. (eg. USB supports PlugandPlay & hot plugging)
PCI is I/O bus that supports the connection of multiple peripheral devices. PCI modem (data/fax modem); Sound card; Video card
“19’’ monitor, .24mm AG, 1280x1024 at 75Hz”=refresh rate of 75Hz + Resolution: .24mm dot pitch supported by an Aperture Grill display. (dot pitch: closest dot of the same color) (AG direct the electron beam that paints the monitor picture on the phosphor coating inside the glass of the monitor)
The more dots to refresh, the longer it takes for each refresh cycle. Experts recommend a refresh rate of at least 75Hz.
“48x CDRW”= x translates to 153000bytes of data per second, 48x is the speed for read and w.
PCIe video card is an interface specification designed to improve system graphic performance by increasing the overall bandwidth.
By speeding up the communication flow between the CPU and the graphic controller, PCIe allows graphics applications to run faster than AGP
Ethernet: network interface card (NIC—connect to the motherboard via a PCI slot) & integrated Ethernet. (Laptop: wireless card)
4. Standards Organizations
Institute of Electrical and Electronic Engineers (IEEE) is an organization dedicated to the advancement of the professions of electronic and computer engineering.
International Telecommunications Union (ITU) concerns itself with the interoperability of telecommunications systems, including telephone, telegraph, and data communication systems.
American National Standards Institute (ANSI). British Standards Institution (BSI). Comite European de Normalization (CEN).
International Organization for Standardization (ISO) is the entity that coordinates worldwide standards development, including the activities of NSI with BSI among others.
5. Historical Development
a) Generation Zero: Mechanical Calculating Machines (16421945)
b) The First Generation: Vacuum Tube Computers (19451953)
c) The Second Generation: Transistorized Computers (19541965)
d) The Third Generation: Integrated Circuit Computers (19651980)
f) The Fourth Generations: VLSI Computers (1980???)
Fifth generation??? Parallel processing and the use of networks and singleuser workstations? Neutral network, DNA, or optical computing systems?
g) Moore’s Law—‘the density of silicon chips doubles every 18 months’
Rock’s Law—‘the cost of capital equipment to build semiconductors will double every four years.
6. The Computer Level Hierarchy
PICTURE: Abstract levels of modern computing systems
Control unit makes sure that instructions are decoded and executed properly and that data is moved where and when it should be. Control units can be designed in one of two ways: hardwired or microprogrammed.
In hardwired control units, control signals emanate from blocks of digital logic components. These signals direct all of the data and instruction traffic to appropriate parts of the system. It is fast but difficult to be modified because they are actually physical components.
A microprogram is a program written in a lowlevel language that is implemented directly by the hardware. It interprets the instructions by activating hardware suited to execute the original instruction. It is slower but flexible to be modified.
7. The von Neumann Model
All storedprogram computers is known as von Neumann systems using the von Neumann architecture.
Today’s storedprogram machine architecture at least:
a) Consists of three hardware systems:
i. A central processing unit (CPU) with a control unit, an arithmetic logic unit (ALU), registers (small storage areas), and a program counter;
ii. A mainmemory system, which holds programs that control the computer’s operation;
iii. An I/O system
b)Capacity to carry out sequential instruction processing
c)Contains a single path, either physically or logically, between the main memory system and the control unit of the CPU, forcing alternation of instruction and execution cycles. This single path is often referred to as the von Neumann bottleneck.
PICTURE: von Neumann Architecture
This architecture runs programs in what is known as the von Neumann execution cycle (or fetchdecodeexecution cycle), which describes how the machine works.
One iteration of the cycle:
a) The control unit fetches the next program instruction form the memory, using the program counter to determine where the instruction is located.
b) The instruction is decoded into a language the ALU can understand
c) Any data operands required to execute the instruction are fetched from memory and placed into registers within the CPU.
d) The ALU executes the instruction and places the results in registers or memory.
NOTE: the idea of von Neumann architecture have been extended so that programs and data stored in a slowtoaccess storage medium, such as a hard disk, can be copied to a fastaccess, volatile storage medium such as RAM prior to execution. This architecture has also been streamlined into what is currently called the system bus model. The data bus moves data form main memory to the CPU registers. The address bus holds the address of the data that the data bus is currently accessing. The control bus carries the necessary control signals that specify how the information transfer is to take place. Other enhancements including using index registers for addressing, adding floating point data, using interrupts and asynchronous I/O, adding virtual memory and adding general registers.
PICTURE: Modified von Neumann Architecture, Adding a System Bus
8. Nonvon Neumann Models
The von Neumann model executes instructions sequentially, therefore, extremely wellsuited to sequential processing.
Today, most of the models has different processing units running in parallel. They are nonvon Neumann models, including neural networks, genetic algorithms, quantum computation, dataflow computation, and parallel computers.
Amdahl’s Law: ‘make sure the slowest parts of the system are the ones that are used the least’ states how the tasks are distributed in parallel processing.
Chapter 2—Data Representation in Computer Systems
1. Introduction
bit, byte, words, nibbles
2. Positional Numbering Systems
radix=base, weighted numbering system by a power of the radix (eg. 101_{2}=1x2^{2}+1x2^{0}=5_{10})
3. Decimal to Binary Conversions
a) Converting Unsigned Whole Numbers
A binary number with N bits can represent unsigned integers from 0 to 2^{N}1
Overflow: 30 cannot represent by only 4 bits.
b) Converting Fractions
c) Converting between PowerofTwo Radices
4. Signed Integer Representation
a) Signed Magnitude
N bits can represent –(2^{(N1)}1) to 2^{(N1)}1
Overflow: in signed numbers occurs when the sign of the result is incorrect.
Doubledabble: fastest way to convert a binary number to decimal. PICTURE
b) Complement Systems
Complement & One’s Complement
Two’s Complement
A Simple Rule for Detecting an Overflow Condition in Signed Numbers: If the carry into the sign bit equals the carry out of the bit, no overflow has occurred. If the carry into the sign bit is different from the carry out of the sign bit, overflow (thus error) has occurred. PICTURE
INTEGER MULTIPLICATION AND DIVISION
DIVISION can use either iteratively subtract the denominator from the divisor or use trial and error of long division (NOTE: most efficient methods used for binary division out of scope)
Divide underflow: the divisor is much smaller than the dividend.
c) Unsigned Versus Signed Numbers
A computer programmer must able to manage both signed and unsigned numbers.
d) Computers, Arithmetic and Booth’s Algorithm
The basic focus is on the implementation of arithmetic functions, which can be realized in software, firmware, or hardware.
Researches are looking for schemes that use nontraditional approaches such as the fast carry lookahead principle, residue arithmetic and Booth’s algorithm.
“Regular” multiplication clearly yields the incorrect result. There are a number of solutions such as converting both values to positive numbers, performing conventional multiplication, then remembering if one or both values were negative to determine if the result should be positive or negative.
Booth’s algorithm not only solves this dilemma, but also speeds up multiplication in the process. It looks for consecutive zeros and ones in the multiplier.
PICTURES: standard multiplication vs. Booth’s algorithm
We are interested in pairs of bits in the multiplier and proceed according to:
i. If the current multiplier bit is 1 and preceding bit was 0, we are at the beginning of a string of ones, so subtract the multiplicand for the product
ii. if the current multiplier bit is 0 and the preceding bit was 1, we are at the end of a string of ones, so we add the multiplicand to the product.
iii. If it is a 00 pair, or a 11 pair, do no arithmetic operation. Simply shift. The power of the algorithm is in this step: we can now treat a string of ones as a string of zeros and do nothing more than shift. PICTURES.
Booth’s algorithm does not like alternating string of zeros and ones. Registers are used and a special type of shift called an arithmetic shift is necessary to preserve the sign bit.
e) Carry Versus Overflow
The rule of thumb used to determine when carry indicates an error depends on whether we are using signed or unsigned numbers. For unsigned numbers, a carry (out of the leftmost bit) indicates the total number of bits was not large enough to hold the resulting value, and overflow has occurred. For signed numbers, if the carry in to the sign bit and the carry (out of the sign bit) differ, then overflow has occurred. The overflow flag is set only when overflow occurs with signed numbers.
5. FloatingPoint Representation
a) A Simple Model
Mantissa=fractional part; Significand=fractional part+implied binary point+implied 1
PICTURE: simple floatingpoint representation
Excess16 representation: we have to subtract 16 to get the true value of the exponent.
Normalization: the leftmost bit of the significand must always be 1
b) FloatingPoint Arithmetic
c) FloatingPoint Errors
Error tripled after six iterations. The challenge to computer scientists is to find efficient algorithms for controlling such errors within the bounds of performance and economics.
d) The IEEE745 FloatingPoint Standard
Single precision; Double precision; Hidden bit
e) Range, Precision, and Accuracy
Range: the interval from the smallest value to largest value in a given format.
Precision: how much information we have about a value and the amount of information used to represent the value. (eg. apply to significant figures)
Accuracy: how close a number to its true value.
f) Additional Problems with FloatingPoint Numbers
Overflow & Underflow; (a+b)+c~= a+(b+c) etc. due to rounding error. See ece4077 course
6. Character Codes
a) BinaryCoded Decimal
BCD is common in display numerical data. It encodes each digit of a decimal number into a 4bit binary form.
packed BCD: stores two digits per byte and allows numbers to be signed, the sign is stored at the end. The standard values for this ‘sign digit’ are 1100 for +, 1101 for – and 1111 to indicate the value is unsigned.
zoned decimal format, stores a decimal digit in the low order nibble of each byte. However, instead of padding the highorder nibbles with zeros, a specific pattern is used. Two choices for the highorder nibble, called the numeric zone.
EBCDIC zoned decimal format requires the zone to be all ones (hex: F)
ASCII zoned decimal format requires the zone to be 0011 (hex: 3)
b) EBCDIC
Extended Binary Coded Decimal Interchange Code
c) ASCII
American Standard Code for Information Interchange
d) Unicode
7. Error Detection and Correction
Parity is the most basic of all error detection schemes
a) Cyclic Redundancy Check (CRC)
it is a type of checksum used primarily in data communications that determines whether an error has occurred within a large block or stream of information bytes. Checksums and CRCs are a type of systematic error detection scheme, meaning that the errorchecking bits are appended to the original information byte. The group of errorchecking bits is called a syndrome. See ece2041 or ece 4024 courses notes
Arithmetic Modulo 2
Calculating and Using CRCs
Message M
Remainder
Info. Bytes I
A new remainder other than zero indicates an error has occurred.
b) Hamming Codes
Hamming Codes are an adaptation of the concept of parity, whereby error detection and correction capabilities are increased in proportion to the number of parity bits (check bits or redundant bits) added to an information word.
Code word: n=m+r (data bits + check bits)
Hamming distance: the number of bit positions in which two code words differ
minimum Hamming distance D(min) determines its error detecting and correcting capability.
To detect k singlebit errors, the code must have a Hamming distance of D(min)=k+1. Hamming codes can always detect D(min)1 errors and correct (D(min)1)/2 errors.
This means for m=4 implies r>=3. For data words of 4 bits to correct singlebit errors, we must add 3 check bits.
To construct error correcting codes for any size memory word:
i) Determine the number of check bits, r, necessary for the code and then number the n bits (n=m+r), right to left, starting with 1 (not 0).
ii) Each bit whose bit number is a power of 2 is a parrity bit—the others are data bits.
iii) Assign parity bits to check bit positions as follows: Bit n is checked by those parity bits b1,b2,…,bj such that b1+b2+…+bj=b (‘+’ indicates the modulo 2 sum)
If we receive 0101 1101 0110 (bit 9 has error)
To implement a Hamming code using simple binary circuits.
c) ReedSoloman (RS)
Hamming codes are useless in situations where there is a likelihood that multiple adjacent bits will be damaged. These kinds of errors are called burst errors.
A ReedSoloman code can be thought of as a CRC that operates over entire characters instead of only a few bits. RS codes are systematic.
The generator polynomial for a ReedSoloman code is given by a polynomial defined over an abstract mathematical structure called a Galois field.
Focus on Codes for Data Recording and Transmission
Magnetic storage devices record data using changes in magnetic polarity called flux reversals.
Data encoding, to mean the process of converting a simple character code such as ASCII to some other code that better lends itself to storage or transmission.
A1. NonReturntoZero Code
Its ‘highs’ and ‘lows’ represent ones and zeros. (3,5 volts)
The slices and specks are called bit cells
‘O’ ‘K’ ‘A’ ‘Y’
A2. NonReturntoZeroInvert Code
NRZI provides a transition—either hightolow or lowtohigh for each binary one and no transition for binary zero.
A3. Phase Modulation (Manchester Code)
It deals with the synchronization problem headon. PM provides a transition for each bit, whether a one or zero. ‘1’ is up transition; ‘0’ is down transition.
A4. Frequency Modulation
Synchronizing transitions occur at the beginning of each bit cell. To encode a binary 1, an additional transition is provided in the center of the bit cell.
*A5. RunLengthLimited Code
RLL is a coding method in which block character code words such as ASCII or EBCDIC are translated into code words specially designed to limit the number of consecutive zeros appearing in the code
An RLL(d,k) code allows a minimum of d and a maximum of k consecutive zeros to appear between any pair of consecutive ones.
RLL contain more bits than the original character code, however, RLL using NRZI on the disk, its data occupies less space on magnetic media because few flux transitions are involved.
Theoretically, RLL is a form of data compression called Huffman coding, where the most likely information bit patterns are encoded using the shortest code word bit patterns.
RLL is 50% more efficient than MFM in terms of flux reversals.
*A6. Partial Response Maximum Likelihood Coding
RLL by itself is insufficient for reliable encoding on today’s ultra high capacity magnetic disk and tape media. As data density increases, encoded bits are necessarily written closer together.
This causing decreased magnetic signal strength, interference occurs. This phenomenon, known as superpositioning.
Partial response maximum likelihood or PRML, a encoding scheme use Viterbi detector circuit matches the partial response pattern to a relatively small set of possible response patterns and the closest match is passed to the digital decoder.
Chapter 3—Boolean Algebra and Digital Logic
1. Introduction
Symbolic logic or Boolean algebra
John Vincent Atanasoff applied Boolean algebra to computing
2. Boolean Algebra
a) Boolean Expressions
Boolean expressions; A Boolean function typically has one or more input values and yields a result, based on these input values, in the set {0,1}.
Three common Boolean operators are AND, OR, and NOT.
Boolean sum; Boolean product; Truth table; PICTURE
b) Boolean Identities
identities, or laws, that apply to Boolean algebra instead of regular algebra is needed.
duality principle, which has both an AND (product) form and OR (sum) form.
c) Simplification of Boolean Expressions
Consensus Theorem: F(x,y,z)=xy+x’z+yz.
d) Complements
DeMorgan’s Law
e) Representing Boolean Function
There are an infinite number of Boolean expressions that are logically equivalent to one another.
sumofproducts form (SOP), requires that the expression be a collection of ANDed variables (or product terms) that are ORed together.
productofsum form (POS), consists of ORed variables (sum terms) that are ANDed together.
SOP and POS are equivalent ways of expressing a Boolean function. They can be converted with one another.
3. Logic Gates
Digital circuits, is the actual physical components perform arithmetic operations or make choices in a computer, are constructed from a number of primitive elements called gates.
a) Symbols for Logic Gates
b) Universal Gates
c) Multiple Input Gates
4. Digital Components
a) Digital Circuits and Their Relationship to Boolean Algebra
b) Integrated Circuits
5. Combinational Circuits
a) Basic Concepts
Digital logic chips are combined to give us useful circuits. These logic circuits can be categorized as either combinational logic or sequential logic.
b) Examples of Typical Combinational Circuits
halfadder; fulladder; ripplecarry adder; black box; decoder; multiplexer
See ece2072 course notes
Another useful set of combinational circuits to study includes a parity generator and a parity checker
A parity generator is a circuit that creates the necessary parity bit to add to a word
A parity checker checks to make sure proper parity is present in the word.
Bit shifting, moving the bits of a word or byte one position to the left or right is a very useful operation.
ALU, arithmetic logic unit, the control lines, f0 and f1, determine which operation is to be performed by the CPU. The signal 00 is used for addition (A+B); 01 for NOT A; 10 for A OR B; and 11 for A AND B. C is the output.
6. Sequential Circuits
The major weakness of combinational circuits is that there is no concept of storage—memoryless.
a) Basic Concepts
flip flopthe storage element
b) Clocks
Some sequential circuits are asynchronous, which means they become active the moment any input value changes.
Synchronous sequential circuits use clocks to order events.
A clock is a circuit that emits a series of pulses with a precise pulse width and a precise interval between consecutive pulses. This interval is called the clock cycle time.
(can be used to generate 010101… clock frequency)
c) FlipFlops
A leveltriggered circuit is allowed to change state whenever the clock signal is either high or low.
A latch is level triggered, whereas a flipflop is edge triggered.
SR flipflop (set/reset) PICTURE
JK (Jack Kilby) flipflop
D (data) flipflip
See ece2072 for more details
e) Finite State Machines (FSM)
An equivalent graphical depiction of characteristic tables for the behavior of flipflops and sequential circuits
Moore machine: circle=state, bracket=output, arcs=transition, (output with state)
Mealy machine: circle=state, arcs=transition, (output with transition)
In the actual implementation of either a Moore or Mealy machine, two thins are required: a memory (register) to store the current state and combinational logic components that control the output and transitions from one state to another.
(Block diagram for Moore and Mealy machines)
Algorithm state machine (ASM), consists of blocks that contain a state box, a label, and optionally condition and output boxes.
Each has at least one entry and one exit. Moore type outputs are indicated inside the state block; Mealytype outputs are indicated in the oval output ‘box’. If a signal is asserted when ‘high’, its is prefixed with an H, otherwise, it is prefixed with an L. If the signal is asserted immediately, it is also prefixed with an I; otherwise, the signal asserts at the next clock cycle. The input conditions that cause changes in state are expressed by elongated, sixsided polygons called condition boxes.
ASM for microwave oven: PICTURE
Hardwarefree Machines: Moore and Mealy machines are only two of many different types of finite state machines that you will encounter in computer science literature. Finite state machines are just a different way of thinking about the computer and computation.
An understanding of FSMs is essential in the study of programming languages, compilers, the theory of computation, and automata theory. We refer to these abstractions as machines because machines are devices that respond to a set of stimuli (events) by generating predictable responses (actions) based on a history of prior events (current state). One of the most important of these is the deterministic finite automata (DFA) computational model.
Formally speaking, a DFA, M, is completely described by the quintuple M=(Q,S,Σ,σ,F) where:

Q is a finite set of states that represents every configuration the machine can assume

S is an element of Q that represents the start state, which is the initial state of the machine before it receives any inputs

Σ is the input alphabet or set of events that the machine will recognize

σ is a transition function that maps a state in Q and a letter from the input alphabet to another state in Q

F is a set of states (elements of Q) designated as the final (or accepting) states.
f) Examples of Sequential Circuits
Latches and flipflops are used to implement more complex sequential circuits. Registers, counters, memories, and shift registers all require the use of storage and are therefore implemented using sequential logic.
Logically Speaking, How’d They Do That?Logic families
i) TTL (transistortransistor logic)
ii) NMOS
iii) CMOS
iv) ECL
v) BiCMOS
g) An Application of Sequential Logic: Convolutional Coding and Viterbi Detectioin
Several coding methods employed in data storage and communication.
i) Partial response maximum likelihood (PRML) encoding method. The salient feature of the decoding process is that only certain bit patterns are valid.
ii) Viterbi decoder, reads the bits that have been output by a convolutional encoder and compares the symbol stream read with a set of ‘probable’ symbol streams. The one with the least error is selected for output.
iii) Hamming code, a type of forward error correction that uses blocks of data (or block coding) to compute the necessary redundant bits.
iv) Convolutional coding, a method that operates on an incoming serial bit stream (including redundant bits) that enables it to correct errors continuously. A convolutional code is an encoding process whereby the output is a function of the input and some number of bits previously received.
PICTURES: Convolutional Encoder for PRML; Stepping Through Four Clock Cycles of a Convolutional Encoder; Characteristic Table for the Convolutional Encoder; Mealy Machine for the Convolutional Encoder; Trellis Diagram Illustrating State Transitions for a Sequence; Trellis Diagram Illustraing Hamming Errors for the Sequence.
7. Designing Circuits
Digital logic design requires someone not only familiar with digital logic, but also well versed in digital analysis (analyzing the relationship between inputs and outputs), digital synthesis (starting with a truth table and determining the logic diagram to implement the given logic function), and the use of computeraided design (CAD) software.
It is easier to write a program than it is to design the hardware necessary to implement the algorithm. However, there are situations in which the hardware implementation is better (eg. a realtime system, the hardware implementation is faster and faster is better)
Share with your friends: 