2-16 DYNAMIC MEMORY ALLOCATION AND POINTERS (& Procedure Pointers)
******************************************************************
(Thanks to Sergio Gelato who contributed some parts of this chapter;
Thanks to Keith Bierman and Eric Grosse for the help with the
procedure pointers section)
In standard FORTRAN 77, the sizes of all objects must be known at
compile time (This does not apply to the sizes of formal arguments
to subprograms, only to those of the actual arguments).
This is inconvenient for many applications, as it means that the
program may have to be recompiled before it can be run with a
different problem size. Many operating environments let the user
allocate contiguous blocks of memory whose sizes are determined
at run time. Many implementations of FORTRAN 77 support an extension
("Cray pointers") that allows programs to make use of this facility.
Cray pointers also allow the construction of linked data structures
such as lists, trees, queues, ... More recently, the Fortran 90
standard introduced roughly equivalent functionality in the form
of ALLOCATABLE arrays on the one hand, and of the POINTER attribute
on the other.
Declaring Cray pointers in FORTRAN 77
-------------------------------------
Cray pointers are just variables (usually of type INTEGER, but it
is best not to declare their type explicitly) that hold the address
of another variable called the pointee.
Example of pointer declaration:
INTEGER N
PARAMETER (N = 10)
REAL POINTEE(N)
POINTER (PTR, POINTEE)
Note that we have here a type statement of only ONE entity: a pointer
called PTR, whose data type is 'pointer to a 1D array of REAL with
dimension N'. This multi-line type statement is a little awkward,
but it obeys FORTRAN syntax.
The array that PTR points to can be accessed by the POINTEE identifier,
but POINTEE is not an array by itself as it has no memory storage
associated with it (until PTR is initialized to point to some).
The above example is confusing because of a common misunderstanding
about FORTRAN type-statements (e.g. REAL X,Y), such statements ARE NOT
like the declarations of PASCAL and C, they don't reserve memory
storage, but just supply data-type information to the compiler.
Put another way, with Cray pointers the pointer and the entity it
points to are declared together and have different identifiers
associated with them, so there is no need for a separate indirection
(dereferencing) operator.
An example program using Cray pointers:
program cyrptr
integer i
real array1(10), array2(5),
& pointee1(10), pointee2(5), pointee3(*)
pointer (ptr1, pointee1),
& (ptr2, pointee2),
& (ptr3, pointee3)
data array1 /0,1,2,3,4,5,6,7,8,9/,
& array2 /5,5,5,5,5/
c ------------------------------------------------------------------
write(*,*)
write(*,'(1x,a,10f6.1)') 'array1= ', array1
write(*,'(1x,a,10f6.1)') 'array2= ', array2
c ------------------------------------------------------------------
write(*,*)
ptr1 = loc(array1)
ptr2 = loc(array1)
ptr3 = loc(array1)
write(*,'(1x,a,10f6.1)') 'pointee1= ', pointee1
write(*,'(1x,a,10f6.1)') 'pointee2= ', pointee2
write(*,'(1x,a,10f6.1)') 'pointee3= ', (pointee3(i), i = 1, 10)
c ------------------------------------------------------------------
write(*,*)
ptr1 = loc(array2)
ptr2 = loc(array2)
ptr3 = loc(array2)
write(*,'(1x,a,10f6.1)') 'pointee1= ', pointee1
write(*,'(1x,a,10f6.1)') 'pointee2= ', pointee2
write(*,'(1x,a,10f6.1)') 'pointee3= ', (pointee3(i), i = 1, 5)
c ------------------------------------------------------------------
end
The result of this program on a VMS machine was:
array1= 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
array2= 5.0 5.0 5.0 5.0 5.0
pointee1= 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
pointee2= 0.0 1.0 2.0 3.0 4.0
pointee3= 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
pointee1= 5.0 5.0 5.0 5.0 5.0 0.0 0.0 0.0 0.0 0.0
pointee2= 5.0 5.0 5.0 5.0 5.0
pointee3= 5.0 5.0 5.0 5.0 5.0
Note that declaring the pointee with assumed-array syntax, makes it
impossible to have I/O statements that reference the array name
without subscript.
Cray pointers in practice
-------------------------
The value of the pointer can be defined using some intrinsic
function usually called LOC(arg) that takes as argument the
name of a variable and returns its memory address.
The following points should be noted:
1) The pointee only exists while the value of the pointer
is defined.
2) The pointee may not be part of a COMMON block, nor may
it be a dummy argument of the current subprogram.
The pointer may be a COMMON block variable or a dummy
argument.
3) You may pass a pointee as an actual argument to a
subprogram without special precautions; i.e., the
subprogram need not know that the object is a pointee;
the subprogram will treat it as it would any ordinary
variable.
4) If you pass a pointer as an actual argument, then the
called subprogram should usually have declared the
corresponding dummy argument as a pointer.
Furthermore:
-- If the pointee is an array, and the dimensions of the array
are specified as run-time expressions rather than compile-time
constants, these will usually be evaluated (in an arbitrary
order, so beware of side-effects in their evaluation) upon
entry into the subprogram.
It follows that you can't declare such variable dimensions
in the main program. A few compilers offer, as an option,
"dynamic dimensioning" in which the array dimensions are
evaluated again on each reference. (For example, XL Fortran
lets you do this by compiling with -qddim.)
Dynamic dimensioning is not as widely supported as Cray pointers,
and can entail additional overhead in some circumstances
(especially when working with multidimensional arrays). You are
therefore encouraged to arrange for the dimensions to be known
upon entry to the subprogram that declares the pointee; otherwise
multidimensional arrays may be indexed incorrectly, and bounds
checking may fail on one-dimensional arrays.
-- Many implementations supply a subprogram to allocate memory
(it may be called MALLOC, for Memory ALLOCation, or HPALLOC,
for HeaP ALLOCation, or something else entirely) and another
subprogram to release memory to the operating system.
On VMS you may use:
STATUS = LIB$GET_VM(SIZE_IN_BYTES, PTR)
to allocate a block of memory, to free that block you may use:
STATUS = LIB$FREE_VM(SIZE_IN_BYTES, PTR)
STATUS is an integer variable used to store the return value
of these system routines, the condition:
(STATUS .NE. .TRUE.)
means the routine failed.
It is your responsibility to allocate a pointee before the
value of the pointer becomes undefined (for example by exiting
the subprogram where it is declared, if it isn't SAVEd.)
Cray pointers and automatic compiler optimizations
--------------------------------------------------
Compilers perform partial data-flow analysis as a preparation before
automatic optimization, unrestricted pointers that are free to point
to any other variable makes such an analysis almost impossible.
Cray pointers are restricted to some degree, e.g. pointer arithmetic
is not allowed, the pointer can point only to its pointee, but the
pointee can be different objects during one procedure activation.
Fortran was designed to produce highly optimized code (remember that
it had to compete with assembly language and won), the language
specification explicitly forbids anything that may lead to ALIASING
(having more than one name for the same variable) because of the
detrimental effect on automatic optimizations. In short, pointers
violate the spirit of Fortran, and defeat its purpose.
Cray pointers may even cause WRONG results when compiler optimization
is turned on, and they are used without deep understanding of the
effect on the optimizer. A probable reason for that may be that the
optimizer assumes no-aliasing.
It is recommended that you use Cray pointers only for allocating
dynamic memory, and pass the corresponding pointees to subroutines.
If you wish to avoid Cray pointers completely, you can call a C routine
for that purpose, see the chapter on variable-size arrays for source-code
and discussion.
Fortran 90 ALLOCATABLE objects
------------------------------
Fortran 90 offers a more elegant way of allocating objects dynamically,
through the ALLOCATABLE attribute and the ALLOCATE and DEALLOCATE
executable statements.
For example, one can declare a one-dimensional allocatable array
as:
REAL, ALLOCATABLE :: A(:)
then allocate it to a given size with:
INTEGER IAS
ALLOCATE (A(3:12), STAT=IAS)
The optional STAT=IAS allows recovery from allocation errors. If the
allocation is unsuccessful and no STAT= was given, the program aborts;
if STAT= was given, IAS contains either 0 for success, or a non-zero
error code.
The array bounds are stored with the array, and are computed at the
time the ALLOCATE statement is executed. This avoids all the difficulties
mentioned above with the dimensioning of Cray pointees.
Don't forget to
DEALLOCATE(A)
when you no longer need it. If you are no longer sure whether A is
allocated or not, say:
IF(ALLOCATED(A)) DEALLOCATE(A) .
Linked data structures and the Fortran 90 POINTER attribute
-----------------------------------------------------------
Pointers are also useful for constructing linked lists, trees and other
dynamic data structures. Cray pointers are suitable for this purpose,
particularly when used in conjunction with the function LOC(arg) that
returns a pointer to its argument, but Fortran 90's ALLOCATABLE arrays
are not. This is why Fortran 90 also supports a POINTER attribute in
addition to ALLOCATABLE. A binary tree node could be declared as
TYPE NODE
TYPE(NODE), POINTER :: LEFT, RIGHT
TYPE(ELEMENT) :: DATA
END TYPE NODE
Fortran 90 pointers can be in one of three states: undefined, associated
and unassociated. The initial state is unassociated. An unassociated
pointer may become associated by executing a pointer assignment statement
pointer => target
or an allocation statement
ALLOCATE(pointer);
an associated pointer may become unassociated by executing a
NULLIFY(pointer)
or, in some cases,
DEALLOCATE(pointer);
any pointer may become undefined in the same way as other Fortran
variables can (by exiting a subprogram without a SAVE for the pointer,
etc.) and once undefined may never be used again. (Be careful!)
Pointers may point to otherwise unnamed objects created by an ALLOCATE
statement, or to named objects declared with the TARGET attribute.
The TARGET attribute is designed as an optimization aid to the compiler;
if you use it on variables that don't need it, your code may run more
slowly.
Summary of differences between Cray and Fortran 90 pointers
===========================================================
Cray Fortran 90
------------------------------------- ------------------------------------
POINTER is a data type POINTER is an attribute
A pointer holds the machine address A pointer holds any information
of the pointee; arithmetic is usually needed to access the object; usually
possible on pointer values, but may an address plus a data type, array
be non-portable. bounds and strides where applicable.
You should not depend on assumptions
about the internal representation,
storage size, etc. of a Fortran 90
pointer. No arithmetic on pointers.
Assigning to a pointer variable Referencing the pointer variable
associates the pointer with a new "does the right thing", i.e. usually
pointee. Referencing the pointee manipulates the value of the pointee.
variable manipulates the current In particular, pointer = value will
pointee without affecting the change the value of the object pointed
pointer. to, but not the association of the
pointer with the pointee. If you want
to change that, use pointer => target
instead.
POINTER is used both to construct POINTER can be used for both linked
linked data structures and to support data structures and dynamic allocation,
dynamic memory allocation. but if dynamic allocation alone is
needed it is better to use ALLOCATABLE
objects. (Significant performance
differences have been reported with
some compilers.)
Procedure pointers as dynamic version of dummy procedures
---------------------------------------------------------
The concept of procedure pointers (PP) is foreign to FORTRAN 77,
Fortran 90 doesn't have them, but Fortran 2000 might.
In FORTRAN 77 a procedure name can be passed as a procedure argument.
The classical example of using such "dummy procedure", is making a
library sorting procedure more general. The idea is to have all
comparison tests performed by a logical function (taking two arguments)
that is passed as an argument, instead of doing it directly with some
binary operator.
In the following example the comparison function 'rgreater' emulates
the usual '.gt.' operator, so nothing is really gained:
program sort
integer n
parameter (n=1000)
real a(n)
external rgreater
................
call slowsort(a,n,rgreater)
...........................
end
logical function rgreater(x,y)
real x,y
if (x .gt. y) then
rgreater = .true.
else
rgreater = .false.
endif
return
end
subroutine slowsort(a,n,comp)
integer n, i, j
real a(n)
logical comp
external comp
..............
if (comp(a(i),a(j))) then
.........................
return
end
The dummy procedure passed to the sorting routine can be selected even
at run-time. The following example is supposed to decide at run-time
whether to sort in ascending or descending order, assuming of course
that we wrote a 'reversed' version of 'rgreater', called 'rlesser'.
integer flag
...............
write(*,*) ' Enter 1 for ascending order, 2 for descending '
read(*,*) flag
if (flag .eq. 1) call slowsort(a,n,rgreater)
if (flag .eq. 2) call slowsort(a,n,rlesser)
...........................................
Using an integer flag, or a similar device, we get "dynamic selection",
but we still cannot work with procedures that were unknown at the time
of compilation/linking.
To illustrate how procedure pointers are a real dynamic version of the
dummy procedure mechanism, let's think of PPs as a generalization of
ordinary variables, while the old dummy procedures may be analogous
to constants.
Ordinary variables are more useful than constants, because they have
two "complementary" properties:
1) They can store a value, and it can be kept
as long as needed
2) The stored value can be easily modified
at run-time when needed
All this may seem obvious and trivial, but it immediately suggests a
sweeping generalization. Why stop at numerical values, why not have
variables that can "store" a function? Of course it's not practical
to actually store the function, and we would rather use a pointer
to the place in memory where its code begins.
One disadvantage is that the compiler cannot perform inter-procedural
optimizations, as the procedural interface is not known at compile-time.
The advantage is gaining more coding flexibility.
The important point is whether we gain useful functionality by using
procedure pointers?
Using PPs is somewhat limited by the availability of valid values
that can be assigned to them, there are few methods to get them:
1) Using an address operator on procedures of the
same program. The value is actually retrieved
after the compiler/linker determines how the
procedures will be organized in the executable.
2) Reading the internal tables of run-time libraries
(now called DLLs) at run-time.
In principle, PPs of the first type, may be replaced by the old dummy
procedure mechanism (using integer flags), but PPs, being a valid data
structure, can be combined to form arrays and linked-lists, which makes
for cleaner code.
Of course, it may be worthwhile to organize PP-values in PP-arrays or
linked-lists, only if we collect enough of them, it may be convenient
then to create things like user menus and compiler internal tables.
For example, a stochastic simulation program that allows the user to
select a Random Number Generator at run-time, and uses the selected RNG,
is easier to write using PPs.
PPs of the second type open new possibilities for the programmer,
but may be a little difficult to use for a novice.
In short, PP are a nice language extension, that is more useful in
"systems programming" contexts, and maybe less so in "pure" scientific
computing, but since these areas are overlapping in flexible software,
they are supposed to find their place.
Return to contents page