-
Fixed point notation
-
Top bit specifies the sign (as in signed magnitude): 0 = positive, 1 = negative.
-
Some bits = integer part (normal format), some bits = fractional part
-
The fractional columns are 2-1, 2-2, ...
-
To convert the fractional part:
Repeat
Multiply the number by 2
Write down & discard the integer part
Until the number reaches 0
-
eg: -37.90625
-
Sign bit is: 1
-
Integer part:
-
Base
|
Num
|
Rem
|
2
|
37
|
|
|
18
|
1
|
|
9
|
0
|
|
4
|
1
|
|
2
|
0
|
|
1
|
0
|
|
0
|
1
|
the integer part is: 100101
-
Base
|
Num
|
Int
|
2
|
0.90625
|
|
|
1.8125
|
1
|
|
1.625
|
1
|
|
1.25
|
1
|
|
0.5
|
0
|
|
1.0
|
1
|
|
0
|
|
the fractional part is: 11101
-
-37.9062510 = 1100101.111012
-
Problem is how many bits to assign to integer & fractional parts
-
More bits in integer part allows larger magnitude numbers
-
More bits in fractional part is more accurate
-
note: Some decimal fractions have infinite binary representations
-
Base
|
Num
|
Int
|
2
|
0.4
|
|
|
0.8
|
0
|
|
1.6
|
1
|
|
1.2
|
1
|
|
0.4
|
0
|
|
0.8
|
0
|
etc………
|
-
Scientific Notation
-
eg: -37.9062510 = -0.3790625E+210
-
The format is Sign Significand E Exponent
-
The signed significand is multiplied by BaseExponent
-
Any number can be normalized so it starts 0.
-
This can be done base 2
-
eg: 37.9062510 = 100101.111012 = 0.10010111101 * 26
-
Floating Point Representation
-
The top bit specifies the sign: 0 = positive, 1 = negative.
-
Some bits = exponent (in biased notation), some bits = significand
-
note: Every normalized significand starts 0.1
-
The 0.1 is not stored, ie. one free bit
-
eg: -37.90625, 7 bit exponent, 8 bit significand
-
Sign bit is 1
-
Exponent is 1000110
-
Significand is 00101111
-
More bits in the exponent part allows larger and smaller magnitude numbers
-
Very large numbers cannot be represented overflow
-
Numbers close to 0 cannot be represented underflow
-
0 cannot be stored, due to the implicit 0.1. How is this handled?
-
More bits in the significand is more accurate
-
Typically: 8 bits exponent, 23 bits significand (plus 1 bit for the sign) gives you a 32 bit floating point number.
-
An Example:
One way is an 8 bit floating point representation:
-
1 bit sign, 3 bits exponent, 4 bits significand
(note: the exponent is in biased-4 representation)
eg: the number 3.5 can be stored:
+3.5 in binary is +11.1
+11.1 is normalized to become 0.111*210
(normalized numbers always start with 0.1, so the first bit can be assumed)
The sign is +, represented by 0
The exponent is 2 (10), and in biased-4 that is 6 (110)
The significand bits are 111 from 0.111
(which can be stored as 1100 since the first bit is assumed)
Share with your friends: |