4-2 FLOATING-POINT NUMBERS - CONCRETE EXAMPLE
**********************************************
IEEE/REAL*4
-----------
To make things more concrete let's look at a typical floating-point
representation for a REAL (SINGLE PRECISION) - the single-precision
unextended IEEE (ANSI/IEEE Std 754-1985) that became a de facto
standard on workstations. The '*4' is a non-standard notation that
says that 4 bytes are allocated for the representation.
A schematic description of the representation follows, the 4 bytes
contain 32 bits that are partitioned into 3 parts (the letter 'S'
in the left part is short for 'Sign')
+-+--------+-----------------------+
|S| exp | fraction |
+-+--------+-----------------------+ Direction of
^ ^ <--- increasing addresses
Bit31 Bit0 (See discussion below)
A formula that gives the value of this float is:
Value = (-1)**S X 1.fffffffffffffffffffffff X 2**(exp - 127)
The most significant bit (MSB) is the sign bit, it is 0 for a positive
number and 1 for a negative number.
The next 8 bits describe the exponent which is BIASED by 127 (see the
formula above), so the range of values is [-127, 128]
The remaining 23 bits are taken as the binary digits of a binary fraction
that has a "whole part" = 1 (see the formula above), this condition is just
the normalization condition.
An IEEE normalized mantissa always has a leading '1' bit, so it is really
redundant and can be always omitted (an old 'trick' attributed to David
Goldberg), it 'saves' one bit that can be used to improve the precision.
The following program may help you examine the structure of REAL on
your machine, it is based on the plausible assumption that integers are
represented in two's complement format.
Of course we could use the Z edit descriptor, but it is not standard
FORTRAN 77, and so may not be implemented by all compilers.
PROGRAM RELREP
C ------------------------------------------------------------------
REAL
* X
C ------------------------------------------------------------------
WRITE(*,*) ' Enter a REAL number: '
READ(*,*) X
CALL BINREP(X)
C ------------------------------------------------------------------
END
SUBROUTINE BINREP(INT)
C ------------------------------------------------------------------
INTEGER
* I,
* INT
C ------------------------------------------------------------------
CHARACTER
* B*32
C ------------------------------------------------------------------
IF (INT .GE. 0) THEN
B(1:1) = '0'
DO I = 32, 2, -1
IF (MOD(INT,2) .EQ. 0) THEN
B(I:I) = '0'
ELSE
B(I:I) = '1'
ENDIF
INT = INT / 2
ENDDO
ELSE
B(1:1) = '1'
INT = ABS(INT + 1)
DO I = 32, 2, -1
IF (MOD(INT,2) .EQ. 0) THEN
B(I:I) = '1'
ELSE
B(I:I) = '0'
ENDIF
INT = INT / 2
ENDDO
ENDIF
C ------------------------------------------------------------------
WRITE(*,*) ' ', B(1:8),' ', B(9:16),' ', B(17:24),' ', B(25:32)
WRITE(*,*) ' ........ ........ ........ ........ '
WRITE(*,*) ' 21098765 43210987 65432109 87654321 '
WRITE(*,*) ' 3 2 1 '
WRITE(*,*) ' '
C ------------------------------------------------------------------
RETURN
END
Special numbers
---------------
Using normalized mantissas raises a little problem, how to represent
zero when the mantissa is not allowed to have zero value?
The IEEE solution is to represent the number zero by a zero fraction
and exponent, but no condition is imposed on the SIGN BIT, so we
have two 'zeros' +0 and -0!
Remember that the exponent is biased by 127, so that a zero exponent
really means that the binary fraction is 'multiplied' by (2 ** (-127)),
in other words, the minimal exponent is reserved to represent zero.
There is also an internal representation for 'INFINITY', it consists
of the maximal exponent = 255 (128 after debiasing) and all fraction
bits = 0. So we have also two 'infinities' one positive and one negative.
An even stranger phenomenon is the class of bit patterns called NaNs,
a NaN has exponent = 255 (128 after debiasing) and fraction bits
which are not all 0. NaN is short for 'Not A Number'.
The special numbers (except zero) were invented in order to implement
NON-STOP ARITHMETIC, instead of aborting the program in the case an
intermediary calculation gives a bad result, the result is replaced
by the appropriate special number and computation continues.
IEEE arithmetic implements an extension of the real numbers system,
the quantities +INFINITY, -INFINITY and the NaNs are added to the
real numbers, and arithmetic operations involving them are defined
in a plausible way. Many users find this extension confusing and
not very useful.
The 'representation density' of IEEE/REAL*4
-------------------------------------------
What is the spacing between two consecutive floating-point numbers?
Positive FPN are the product of a 'normalized' binary fraction with
23 binary digits, and (2 ** e), where e is in [-126,127].
Remember that the exponents -127 and +128 are reserved to represent
zero and infinity respectively.
The 'normal' FPNs can be partitioned into 254 disjoint sets, one for
each possible exponent, each set containing (2 ** 23) numbers, one
for each possible binary fraction of length 32.
The spacing between consecutive numbers belonging to the same set,
is the same, and equals (2 ** (-23)) * (2 ** e) = 2 ** (e -32).
It is clear that the spacing increases when e (and the magnitude
of the number) increases.
The minimal positive FPN is (+1.0) * (2 ** (-126)) = 2 ** (-126),
the spacing at that region is (2 ** (-126 - 32)) = 2 ** (-158).
We see that the minimal positive FPN is MUCH LARGER than the
local spacing.
The number space of IEEE/REAL*4
-------------------------------
If we will translate the binary data from previous sections to
decimal, we will find the range of numbers that can be represented
by the IEEE REAL*4 is:
(-3.4 X 10**+38, +3.4 X 10**+38)
Because the minimal FPN is so much larger than the nearby spacing,
it is more instructive to look at that range as the union of three
discrete segments:
(-3.4 X 10**+38, -1.2 X 10**-38)
(0.0)
(+1.2 X 10**-38, +3.4 X 10**+38)
In this floating-point representation we have a finite number of numbers
filling the three ranges, two of them with variable 'density'.
+---------------------------------------------------------------------+
| SUMMARY |
| ======= |
| 1) IEEE/REAL*4 = 1 Sign bit, 8 exponent bits, 23 mantissa bits |
| 2) There are all kinds of 'strange numbers' |
| 3) The number space is discrete, made of three parts, and has |
| maximal 'density' near zero |
+---------------------------------------------------------------------+
Return to contents page