4-4 FLOATING POINT NUMBERS ON DIFFERENT MACHINES
*************************************************
The information in this chapter is due mainly to Arne Vajhoej, without
him it would have been impossible. The other data was gathered with the
routine MACHAR written by Cody and Malcolm.
Much more sophisticated routines are SPARA and DPARA from the PARANOIA
package, the package can be found in Netlib. These routines check single
(SPARA) and double precision (DPARA) arithmetic for finer numerical details.
The tables below are useful when you need to supply routines with
a 'required accuracy parameter', or when you have to convert routines
written for some REAL type to another REAL type.
We will first take the complex VMS case and later add a typical
UNIX machine.
VMS case
--------
Floating-point related compiler switches on VMS
===============================================
Architecture | Compiler switch | REAL*4 | REAL*8 | REAL*16 |
=============|===================|============|============|============|
VAX | /G_FLOATING | F_FLOATING | G_FLOATING | H_FLOATING |
-------------|-------------------|------------|------------|------------|
VAX | /NOG_FLOATING | F_FLOATING | D_FLOATING | H_FLOATING |
-------------|-------------------|------------|------------|------------|
ALPHA | /G_FLOATING | F_FLOATING | G_FLOATING | X_FLOATING |
-------------|-------------------|------------|------------|------------|
ALPHA | /NOG_FLOATING | F_FLOATING | D_FLOATING | X_FLOATING |
-------------|-------------------|------------|------------|------------|
ALPHA | /FLOAT=G_FLOAT | F_FLOATING | G_FLOATING | X_FLOATING |
-------------|-------------------|------------|------------|------------|
ALPHA | /FLOAT=D_FLOAT | F_FLOATING | D_FLOATING | X_FLOATING |
-------------|-------------------|------------|------------|------------|
ALPHA | /FLOAT=IEEE_FLOAT | S_FLOATING | T_FLOATING | X_FLOATING |
-------------|-------------------|------------|------------|------------|
Floating-point types on DEC computers
=====================================
Name | Size | Standard | VAX status | ALPHA status | Comments
===========|======|==========|==============|===============|===============
F_FLOATING | 4 | DEC | # | # |
-----------|------|----------|--------------|---------------|---------------
S_Floating | 4 | IEEE | ======== | # |
-----------|------|----------|--------------|---------------|---------------
D_FLOATING | 8 | DEC | # Default | # | Less Precision
| | | | | on ALPHA!
-----------|------|----------|--------------|---------------|---------------
G_FLOATING | 8 | DEC | # | # Default |
-----------|------|----------|--------------|---------------|---------------
T_Floating | 8 | IEEE | ======== | # |
-----------|------|----------|--------------|---------------|---------------
H_Floating | 16 | DEC | # Older VAXs | ========== |
| | | */# Newer | |
-----------|------|----------|--------------|---------------|---------------
X_Floating | 16 | IEEE | ======== | * Only in |
| | | | Fortran |
-----------|------|----------|--------------|---------------|---------------
# Implemented in hardware * Implemented in software
Remark on table above
---------------------
D_FLOAT calculations on ALPHA are done by converting to
G_FLOAT, computing and converting back to D_FLOAT, see
remarks to the next table.
Numerical properties of floating-points on DEC computers
========================================================
Name | Size | Mant | Expo | Minimum | Maximum | Precision | Roun
| | issa | nent | | | (1-) (1+) | ding
========|======|======|======|============|============|===========|======
F_FLOAT | 32 | 23 | 8 | 0.29E-38 | 0.17E+39 | 6E-8 | DEC
--------|------|------|------|------------|------------|-----------|------
S_Float | 32 | 23 | 8 | 0.12E-37 | 0.34E+39 | 6,12E-8 | IEEE
--------|------|------|------|------------|------------|-----------|------
D_FLOAT | 64 | 55 | 8 | 0.29E-38 | 0.17E+39 | 14E-18 | DEC
(ALPHA) | ** | 52 | 8 | 0.29E-38 | 0.17E+39 | 11E-17 | DEC
--------|------|------|------|------------|------------|-----------|------
G_FLOAT | 64 | 52 | 11 | 0.56E-308 | 0.9E+308 | 11E-17 | DEC
--------|------|------|------|------------|------------|-----------|------
T_Float | 64 | 52 | 11 | 0.22E-307 | 0.18E+309 | 11,22E-17 | IEEE
--------|------|------|------|------------|------------|-----------|------
H_Float | 128 | 112 | 15 | 0.84E-4932 | 0.59E+4932 | 0.96E-34 | DEC
--------|------|------|------|------------|------------|-----------|------
X_Float | 128 | 112 | 15 | 0.34E-4931 | 0.12E+4933 | 1,2E-34 | IEEE
--------|------|------|------|------------|------------|-----------|------
Remarks on the table above
--------------------------
1) The mantissa size doesn't include the hidden bit.
2) The 'effective precision' have actually two values:
(1+) the smallest positive number satisfying: 1.0 + X .NE. 1.0
(1-) the smallest positive number satisfying: 1.0 - X .NE. 1.0
In the non-DEC float types the two values are different, and
both are given.
3) D_FLOAT on ALPHA loses 3 mantissa bits, it has the low precision
of G_FLOAT combined with the small range of D_FLOAT.
X_FLOAT always underflows using denormalized numbers, (also called
graceful underflowing), all other float types underflows by default
in the assign zero method.
You can change the underflowing behaviour for IEEE floating-points, with
the switch /IEEE_MODE=DENORM_RESULTS the underflow trapping is done in
software and not in the Floating Point Unit and is slow.
Sun IEEE floats (SPARCsystem 600MP)
-----------------------------------
REAL*4 Characteristics
======================
Representation radix 2
Mantissa size 24
Exponent size 8
Rounding: IEEE type
Underflow: Graceful
Numerical Precision (+) 1.19209E-07
Numerical Precision (-) 5.96046E-08
Minimal Usable number 1.17549E-38
Maximal Usable number 3.40282E+38
REAL*8 Characteristics
======================
Representation radix 2
Mantissa size 53
Exponent size 11
Rounding: IEEE type
Underflow: Graceful
Numerical Precision (+) 2.2204460492503D-16
Numerical Precision (-) 1.1102230246252D-16
Minimal Usable number 2.2250738585072-308
Maximal Usable number 1.7976931348623+308
REAL*16 Characteristics
=======================
Representation radix 2
Mantissa size 113
Exponent size 15
Rounding: IEEE type
Underflow: Graceful
Numerical Precision (+) 1.9259299443872358530559779425849273Q-034
Numerical Precision (-) 9.6296497219361792652798897129246366Q-035
Minimal Usable number 3.3621031431120935062626778173217526-4932
Maximal Usable number 1.1897314953572317650857593266280070+4932
+---------------------------------------------------------------------+
| SUMMARY OF FLOATING POINT TYPES |
| =============================== |
| REAL*4 precision 6-12 X (10 ** -8) |
| REAL*4 smallest useable number 3-12 X (10 ** -39) |
| REAL*4 largest useable number 1-3 X (10 ** +38) |
| |
| REAL*8 precision 2-22 X (10 ** -17) |
| REAL*8 smallest useable number 3 X (10 ** -39) - |
| 6 X (10 ** -309) |
| REAL*8 largest useable number 1 X (10 ** +38) - |
| 1 X (10 ** +308) |
| |
| REAL*16 precision 1-2 X (10 ** -34) |
| REAL*16 smallest useable number 9-34 X (10 ** -4933) |
| REAL*16 largest useable number 6-12 X (10 ** +4931) |
+---------------------------------------------------------------------+
Return to contents page