Dynamic memory allocation and pointers

 2-16 DYNAMIC MEMORY ALLOCATION AND POINTERS (& Procedure Pointers)
 ******************************************************************
 (Thanks to Sergio Gelato who contributed some parts of this chapter;
  Thanks to Keith Bierman and Eric Grosse for the help with the 
  procedure pointers section)

 In standard FORTRAN 77, the sizes of all objects must be known at
 compile time (This does not apply to the sizes of formal arguments 
 to subprograms, only to those of the actual arguments).

 This is inconvenient for many applications, as it means that the 
 program may have to be recompiled before it can be run with a 
 different problem size.  Many operating environments let the user 
 allocate contiguous blocks of memory whose sizes are determined 
 at run time. Many implementations of FORTRAN 77 support an extension 
 ("Cray pointers") that allows programs to make use of this facility. 

 Cray pointers also allow the construction of linked data structures 
 such as lists, trees, queues, ... More recently, the Fortran 90 
 standard introduced roughly equivalent functionality in the form 
 of ALLOCATABLE arrays on the one hand, and of the POINTER attribute 
 on the other.


 Declaring Cray pointers in FORTRAN 77
 -------------------------------------
 Cray pointers are just variables (usually of type INTEGER, but it 
 is best not to declare their type explicitly) that hold the address 
 of another variable called the pointee.

 Example of pointer declaration:

      INTEGER    N
      PARAMETER  (N = 10)
      REAL       POINTEE(N)
      POINTER    (PTR, POINTEE)

 Note that we have here a type statement of only ONE entity: a pointer 
 called PTR, whose data type is 'pointer to a 1D array of REAL with
 dimension N'. This multi-line type statement is a little awkward, 
 but it obeys FORTRAN syntax.

 The array that PTR points to can be accessed by the POINTEE identifier, 
 but POINTEE is not an array by itself as it has no memory storage 
 associated with it (until PTR is initialized to point to some).

 The above example is confusing because of a common misunderstanding 
 about FORTRAN type-statements (e.g. REAL X,Y), such statements ARE NOT
 like the declarations of PASCAL and C, they don't reserve memory 
 storage, but just supply data-type information to the compiler.

 Put another way, with Cray pointers the pointer and the entity it 
 points to are declared together and have different identifiers 
 associated with them, so there is no need for a separate indirection 
 (dereferencing) operator.


 An example program using Cray pointers:

      program cyrptr 
      integer		i
      real		array1(10),   array2(5), 
     &			pointee1(10), pointee2(5), pointee3(*)
      pointer		(ptr1, pointee1),
     &			(ptr2, pointee2),
     &			(ptr3, pointee3)
      data		array1   /0,1,2,3,4,5,6,7,8,9/,
     &			array2   /5,5,5,5,5/
c     ------------------------------------------------------------------
      write(*,*) 
      write(*,'(1x,a,10f6.1)') 'array1=   ', array1
      write(*,'(1x,a,10f6.1)') 'array2=   ', array2
c     ------------------------------------------------------------------
      write(*,*) 
      ptr1 = loc(array1)
      ptr2 = loc(array1)
      ptr3 = loc(array1)
      write(*,'(1x,a,10f6.1)') 'pointee1= ', pointee1
      write(*,'(1x,a,10f6.1)') 'pointee2= ', pointee2
      write(*,'(1x,a,10f6.1)') 'pointee3= ', (pointee3(i), i = 1, 10)
c     ------------------------------------------------------------------
      write(*,*) 
      ptr1 = loc(array2)
      ptr2 = loc(array2)
      ptr3 = loc(array2)
      write(*,'(1x,a,10f6.1)') 'pointee1= ', pointee1
      write(*,'(1x,a,10f6.1)') 'pointee2= ', pointee2
      write(*,'(1x,a,10f6.1)') 'pointee3= ', (pointee3(i), i = 1, 5)
c     ------------------------------------------------------------------
      end


 The result of this program on a VMS machine was:

 array1=      0.0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0
 array2=      5.0   5.0   5.0   5.0   5.0

 pointee1=    0.0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0
 pointee2=    0.0   1.0   2.0   3.0   4.0
 pointee3=    0.0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0

 pointee1=    5.0   5.0   5.0   5.0   5.0   0.0   0.0   0.0   0.0   0.0
 pointee2=    5.0   5.0   5.0   5.0   5.0
 pointee3=    5.0   5.0   5.0   5.0   5.0


 Note that declaring the pointee with assumed-array syntax, makes it
 impossible to have I/O statements that reference the array name 
 without subscript. 



 Cray pointers in practice
 -------------------------
 The value of the pointer can be defined using some intrinsic 
 function usually called LOC(arg) that takes as argument the 
 name of a variable and returns its memory address.

 The following points should be noted:

    1) The pointee only exists while the value of the pointer 
       is defined.
    2) The pointee may not be part of a COMMON block, nor may 
       it be a dummy argument of the current subprogram. 
       The pointer may be a COMMON block variable or a dummy 
       argument.
    3) You may pass a pointee as an actual argument to a 
       subprogram without special precautions; i.e., the 
       subprogram need not know that the object is a pointee; 
       the subprogram will treat it as it would any ordinary 
       variable.
    4) If you pass a pointer as an actual argument, then the 
       called subprogram should usually have declared the 
       corresponding dummy argument as a pointer.

 Furthermore:

    -- If the pointee is an array, and the dimensions of the array 
       are specified as run-time expressions rather than compile-time 
       constants, these will usually be evaluated (in an arbitrary 
       order, so beware of side-effects in their evaluation) upon 
       entry into the subprogram.

       It follows that you can't declare such variable dimensions 
       in the main program. A few compilers offer, as an option, 
       "dynamic dimensioning" in which the array dimensions are 
       evaluated again on each reference. (For example, XL Fortran 
       lets you do this by compiling with -qddim.)

       Dynamic dimensioning is not as widely supported as Cray pointers, 
       and can entail additional overhead in some circumstances 
       (especially when working with multidimensional arrays). You are 
       therefore encouraged to arrange for the dimensions to be known 
       upon entry to the subprogram that declares the pointee; otherwise 
       multidimensional arrays may be indexed incorrectly, and bounds 
       checking may fail on one-dimensional arrays.

    -- Many implementations supply a subprogram to allocate memory 
       (it may be called MALLOC, for Memory ALLOCation, or HPALLOC, 
       for HeaP ALLOCation, or something else entirely) and another 
       subprogram to release memory to the operating system. 

       On VMS you may use: 

         STATUS = LIB$GET_VM(SIZE_IN_BYTES, PTR)

       to allocate a block of memory, to free that block you may use:

         STATUS = LIB$FREE_VM(SIZE_IN_BYTES, PTR) 

       STATUS is an integer variable used to store the return value 
       of these system routines, the condition:

         (STATUS .NE. .TRUE.) 

       means the routine failed.

       It is your responsibility to allocate a pointee before the 
       value of the pointer becomes undefined (for example by exiting
       the subprogram where it is declared, if it isn't SAVEd.)


 Cray pointers and automatic compiler optimizations
 --------------------------------------------------
 Compilers perform partial data-flow analysis as a preparation before 
 automatic optimization, unrestricted pointers that are free to point 
 to any other variable makes such an analysis almost impossible. 

 Cray pointers are restricted to some degree, e.g. pointer arithmetic 
 is not allowed, the pointer can point only to its pointee, but the
 pointee can be different objects during one procedure activation. 

 Fortran was designed to produce highly optimized code (remember that
 it had to compete with assembly language and won), the language 
 specification explicitly forbids anything that may lead to ALIASING 
 (having more than one name for the same variable) because of the 
 detrimental effect on automatic optimizations. In short, pointers 
 violate the spirit of Fortran, and defeat its purpose.

 Cray pointers may even cause WRONG results when compiler optimization 
 is turned on, and they are used without deep understanding of the 
 effect on the optimizer. A probable reason for that may be that the 
 optimizer assumes no-aliasing.

 It is recommended that you use Cray pointers only for allocating
 dynamic memory, and pass the corresponding pointees to subroutines. 

 If you wish to avoid Cray pointers completely, you can call a C routine 
 for that purpose, see the chapter on variable-size arrays for source-code 
 and discussion.


 Fortran 90 ALLOCATABLE objects
 ------------------------------
 Fortran 90 offers a more elegant way of allocating objects dynamically,
 through the ALLOCATABLE attribute and the ALLOCATE and DEALLOCATE
 executable statements. 

 For example, one can declare a one-dimensional allocatable array 
 as:

   REAL, ALLOCATABLE :: A(:)

 then allocate it to a given size with: 

   INTEGER IAS
   ALLOCATE (A(3:12), STAT=IAS)

 The optional STAT=IAS allows recovery from allocation errors. If the
 allocation is unsuccessful and no STAT= was given, the program aborts;
 if STAT= was given, IAS contains either 0 for success, or a non-zero 
 error code.

 The array bounds are stored with the array, and are computed at the
 time the ALLOCATE statement is executed. This avoids all the difficulties
 mentioned above with the dimensioning of Cray pointees.

 Don't forget to 

   DEALLOCATE(A)

 when you no longer need it. If you are no longer sure whether A is 
 allocated or not, say:

   IF(ALLOCATED(A)) DEALLOCATE(A) .



 Linked data structures and the Fortran 90 POINTER attribute
 -----------------------------------------------------------
 Pointers are also useful for constructing linked lists, trees and other
 dynamic data structures. Cray pointers are suitable for this purpose,
 particularly when used in conjunction with the function LOC(arg) that
 returns a pointer to its argument, but Fortran 90's ALLOCATABLE arrays
 are not. This is why Fortran 90 also supports a POINTER attribute in
 addition to ALLOCATABLE. A binary tree node could be declared as

   TYPE NODE
      TYPE(NODE), POINTER :: LEFT, RIGHT
      TYPE(ELEMENT) :: DATA
   END TYPE NODE

 Fortran 90 pointers can be in one of three states: undefined, associated
 and unassociated. The initial state is unassociated. An unassociated
 pointer may become associated by executing a pointer assignment statement

   pointer => target

 or an allocation statement

   ALLOCATE(pointer);

 an associated pointer may become unassociated by executing a

   NULLIFY(pointer)

 or, in some cases,

   DEALLOCATE(pointer);

 any pointer may become undefined in the same way as other Fortran 
 variables can (by exiting a subprogram without a SAVE for the pointer, 
 etc.) and once undefined may never be used again. (Be careful!)

 Pointers may point to otherwise unnamed objects created by an ALLOCATE
 statement, or to named objects declared with the TARGET attribute.
 The TARGET attribute is designed as an optimization aid to the compiler;
 if you use it on variables that don't need it, your code may run more 
 slowly.


   Summary of differences between Cray and Fortran 90 pointers
   ===========================================================

 Cray                                   Fortran 90
 -------------------------------------  ------------------------------------
 POINTER is a data type                 POINTER is an attribute

 A pointer holds the machine address    A pointer holds any information
 of the pointee; arithmetic is usually  needed to access the object; usually
 possible on pointer values, but may    an address plus a data type, array
 be non-portable.                       bounds and strides where applicable.
                                        You should not depend on assumptions
                                        about the internal representation,
                                        storage size, etc. of a Fortran 90
                                        pointer. No arithmetic on pointers.

 Assigning to a pointer variable        Referencing the pointer variable
 associates the pointer with a new      "does the right thing", i.e. usually
 pointee. Referencing the pointee       manipulates the value of the pointee.
 variable manipulates the current       In particular, pointer = value will
 pointee without affecting the          change the value of the object pointed
 pointer.                               to, but not the association of the
                                        pointer with the pointee. If you want
                                        to change that, use pointer => target
                                        instead.

 POINTER is used both to construct      POINTER can be used for both linked
 linked data structures and to support  data structures and dynamic allocation,
 dynamic memory allocation.             but if dynamic allocation alone is
                                        needed it is better to use ALLOCATABLE
                                        objects. (Significant performance
                                        differences have been reported with
                                        some compilers.)


 Procedure pointers as dynamic version of dummy procedures
 ---------------------------------------------------------
 The concept of procedure pointers (PP) is foreign to FORTRAN 77, 
 Fortran 90 doesn't have them, but Fortran 2000 might.

 In FORTRAN 77 a procedure name can be passed as a procedure argument.
 The classical example of using such "dummy procedure", is making a 
 library sorting procedure more general.  The idea is to have all 
 comparison tests performed by a logical function (taking two arguments) 
 that is passed as an argument, instead of doing it directly with some 
 binary operator.

 In the following example the comparison function 'rgreater' emulates
 the usual '.gt.' operator, so nothing is really gained:

      program sort
      integer    n
      parameter  (n=1000)
      real       a(n)
      external   rgreater
      ................
      call slowsort(a,n,rgreater)
      ...........................
      end

      logical function rgreater(x,y)
      real   x,y
      if (x .gt. y) then
        rgreater = .true.
      else
        rgreater = .false.
      endif
      return
      end

      subroutine slowsort(a,n,comp)
      integer   n, i, j
      real      a(n)
      logical   comp
      external  comp
      ..............
      if (comp(a(i),a(j))) then
      .........................
      return
      end

 The dummy procedure passed to the sorting routine can be selected even 
 at run-time.  The following example is supposed to decide at run-time 
 whether to sort in ascending or descending order, assuming of course
 that we wrote a 'reversed' version of 'rgreater', called 'rlesser'.

      integer    flag
      ...............
      write(*,*) ' Enter 1 for ascending order, 2 for descending '
      read(*,*) flag
      if (flag .eq. 1) call slowsort(a,n,rgreater)
      if (flag .eq. 2) call slowsort(a,n,rlesser)
      ...........................................

 Using an integer flag, or a similar device, we get "dynamic selection", 
 but we still cannot work with procedures that were unknown at the time 
 of compilation/linking. 


 To illustrate how procedure pointers are a real dynamic version of the
 dummy procedure mechanism, let's think of PPs as a generalization of 
 ordinary variables, while the old dummy procedures may be analogous 
 to constants.  

 Ordinary variables are more useful than constants, because they have 
 two "complementary" properties: 

    1) They can store a value, and it can be kept 
       as long as needed

    2) The stored value can be easily modified 
       at run-time when needed

 All this may seem obvious and trivial, but it immediately suggests a 
 sweeping generalization.  Why stop at numerical values, why not have 
 variables that can "store" a function?  Of course it's not practical 
 to actually store the function, and we would rather use a pointer 
 to the place in memory where its code begins.

 One disadvantage is that the compiler cannot perform inter-procedural 
 optimizations, as the procedural interface is not known at compile-time.
 The advantage is gaining more coding flexibility.

 The important point is whether we gain useful functionality by using 
 procedure pointers?  


 Using PPs is somewhat limited by the availability of valid values 
 that can be assigned to them, there are few methods to get them:

   1) Using an address operator on procedures of the
      same program.  The value is actually retrieved 
      after the compiler/linker determines how the
      procedures will be organized in the executable.

   2) Reading the internal tables of run-time libraries 
      (now called DLLs) at run-time.

 In principle, PPs of the first type, may be replaced by the old dummy 
 procedure mechanism (using integer flags), but PPs, being a valid data 
 structure, can be combined to form arrays and linked-lists, which makes 
 for cleaner code.  

 Of course, it may be worthwhile to organize PP-values in PP-arrays or 
 linked-lists, only if we collect enough of them, it may be convenient 
 then to create things like user menus and compiler internal tables.

 For example, a stochastic simulation program that allows the user to 
 select a Random Number Generator at run-time, and uses the selected RNG, 
 is easier to write using PPs.

 PPs of the second type open new possibilities for the programmer,
 but may be a little difficult to use for a novice.

 In short, PP are a nice language extension, that is more useful in 
 "systems programming" contexts, and maybe less so in "pure" scientific 
 computing, but since these areas are overlapping in flexible software,
 they are supposed to find their place.
Return to contents page