2-4 PASSING ARRAYS TO SUBPROGRAMS - ADJUSTABLE AND ASSUMED SIZE ARRAYS
***********************************************************************
(Thanks to Craig Burley for the important corrections and contributions
to this chapter)
Arrays "internals"
------------------
An array is an allocated area in memory, together with the information
needed to interpret it correctly. The information needed for that
consists of:
1) The data type of the elements. All elements have
the same data type, so it's natural to attribute
the same data type to the array itself. In other
words there is one data type per array, and it
applies to all elements.
2) Number of dimensions
3) The start-index and end-index of each dimension
4) Specification of the element order in memory
The compiler needs all this information in order to generate references
to array elements, or implement whole array operations. Since a FORTRAN
compiler compiles separately each procedure, this info must be available
in the called procedure, when passing an array to a procedure.
Possible mechanisms for passing array configuration info are combinations
of the following:
1) Explicit declarations in the called procedure
2) Using the procedure arguments or common block variables
3) Ignoring some of the info, if possible
4) A special data structure passed with the array itself
In Fortran 90, when using assumed-shape arrays, part of this information
is passed explicitly with the help of a special data structure called
"dope vector" or "descriptor". In all other cases, the compiler uses info
inferred from the data declarations of the called procedure.
The internal structure of arrays is imposed by the compiler consistently
interpreting the storage area as a sequence of specific elements.
The interpretation is done by code the compiler inserts in the program,
the code implements a formula that translates array indices to memory
addresses. For example, a formula for a one-dimensional array may be
similar to:
Memory-address =
Base-address + Element-size * (Array-index - Start-index)
The default start address in FORTRAN is of course 1.
See the section on array storage order for a description of FORTRAN's
'column major' storage order and the implications on array processing
loops and memory management.
FORTRAN 77 array sizes are determined at the time of compilation, the
'outermost' declaration of the array allocates a chunk of memory, and
the size of that memory chunk can't be changed during runtime.
Passing arrays to subprograms
-----------------------------
Arrays declared at an 'outer' procedure (the calling procedure or
some procedure that called it) can be passed to another procedure
(the called procedure).
Note that dimensional and data-type information is usually not
passed with variables (e.g. and arrays). As FORTRAN compiles
each procedure separately, the same storage area can be interpreted
differently in each procedure without any warning.
Most FORTRAN compilers pass arrays by passing the address of the
array, in this case all subprograms that use the array will then
work on the same (and only) copy.
The FORTRAN standard allows array passing by another method called
copying in/out or copy/restore. This method is not popular on
uni-processor machines, copying the array on every procedure call,
will make large arrays get duplicated in memory again and again,
copying in/out only the array base address is done by many FORTRAN
implementations.
On distributed memory machines copying in/out is faster, it is more
efficient to move the array in one piece over the network. This method
was also used in some of the early stack machine, like some of the
A series mainframes.
If your compiler uses copying in/out, some non-standard tricks will
not work, e.g. declaring arrays to have length 1 and having larger
array indices (that necessarily go out bounds).
When the array is declared in the 'outermost' procedure, the compiler
allocates memory for it. When you pass the array with a CALL
statement, the compiler actually passes the base-address of the same
array to the called procedure. When the called procedure operates on
the array it works on the same array - uses the same memory storage
area, the array is not 'copied' to another memory area (But, it might
be in some cases on some Fortran systems).
In short, the memory used in an array is allocated in the 'topmost'
declaration and reused when you pass the array to other procedures.
Comparison of the various array (and string) types
--------------------------------------------------
For further explanation see below.
| Declaration syntax | Physical size in | Logical size in
| | called routine | called routine
------------------|--------------------|-------------------|-----------------
Constant | INTEGER ARR(10) | Constant in all | Same as the
(created/passed) | | invocations | physical size
------------------|--------------------|-------------------|-----------------
Adjustable | INTEGER ARR(N) | The size it had | Whatever value N
(passed) | N & ARR dummy args | in the caller | had upon entry
------------------|--------------------|-------------------|-----------------
Assumed-size | INTEGER ARR(*) | The size it had | The compiler
(passed) | | in the caller | doesn't know it
------------------|--------------------|-------------------|-----------------
Automatic~ | INTEGER ARR(N) | Whatever value N | Same as the
(created) | N a dummy arg | had upon entry | physical size
------------------|--------------------|-------------------|-----------------
Assumed-shape~~ | INTEGER ARR(:) | Same bounds & size| Same as the
(passed) | ARR a dummy arg | as in the caller | physical size
------------------|--------------------|-------------------|-----------------
~ Automatic arrays are supported in g77 and Fortran 90
~~ Assumed-shape arrays are supported in Fortran 90
------------------|--------------------|-------------------|-----------------
Constant string | CHARACTER ST*10 | constant in all | same as the
(created) | | invocations | physical size
------------------|--------------------|-------------------|-----------------
Passed-length | CHARACTER ST*(*) | Whatever size it | same as the
string (passed) | | had in the caller | physical size
------------------|--------------------|-------------------|-----------------
Redeclaration problem
---------------------
When you pass an array to a subprogram you really (in most compilers)
pass the memory address of the beginning of the array.
At the beginning of the called subprogram the compiler should have
also the dimensional information.
The dimensional information (with the memory element order) is
sufficient for the compiler to generate the code for the called
subprogram without knowing anything else about the calling subprogram.
A parameter statement is the 'right' way to explicitly declare an
array size:
INTEGER
* MAXDATA
C ------------------------------------------------------------------
PARAMETER(
* MAXDATA = 1000)
C ------------------------------------------------------------------
REAL
* DATA(MAXDATA)
Explicitly redeclaring the array size in the called subprogram is
not a good programming practice:
1) In order to change the array size you have to hunt
for it in every subprogram that uses it.
2) Such a subprogram can't be reused in another program
without changing the parameter value, and so can't be
used effectively in a routine's library.
3) Reusing the subprogram in the same program may lead
to absurd situations, several copies of the same
subprogram with different parameters values will
have to be kept.
We see that explicit redeclarations are incompatible with good
programming practice. We need a way to pass the dimensional information
to the subprogram, thus making it more flexible, or give up some of the
compiler dimension control.
So what we have to do is pass the array base-address and some extra
variables containing the dimensional information to the subprogram,
and tell the compiler to use all that when accessing the array.
In FORTRAN you tell the compiler to perform this trick using the
adjustable-size syntax or give up some dimension control with the
assumed-size syntax.
Adjustable/ Assumed size arrays
-------------------------------
The adjustable size method allows you not only to pass the 'correct'
dimensional information, but also to 'reconfigure' the array - tell
the compiler to treat the storage area of the original array as if
it belonged to an array with different dimensions.
In the assumed size method, you pass the original array giving up
dimension control by the compiler. You can do this only with the size
of one 'bound', namely the last upper one.
The two special methods for passing arrays are very useful when you
create general purpose routines. They can also be combined together,
you can pass the 'last' dimension with the assumed-size syntax, and
the other dimensions with the adjustable-size syntax.
No such mechanism exists in standard C (some compilers had it
implemented, e.g. gcc's parameterized arrays), and you have to resort
to strange looking tricks (see the appendix on C pointers & arrays).
Example of adjustable and assumed size arrays
---------------------------------------------
In the following example we pass BOTH an array and the dimensional
information to a subroutine using 2 methods:
1) Passing the array and the dimensional information
in the argument list. This is the usual ADJUSTABLE
SIZE ARRAY METHOD.
2) Passing the array in the argument list as an
ASSUMED SIZE ARRAY. The dimensional information
except the upper bound of the last dimension is
either implicit (lower bound = 1), passed with
the adjustable array method, or is constant.
C ==================================================================
C Main program for both examples
C ==================================================================
PROGRAM TEST1
INTEGER N
PARAMETER (N = 10)
REAL Y(N)
C ------------------------------------------------------------------
Y(1) = 0.123456
CALL A(N,Y)
CALL B(Y)
C ------------------------------------------------------------------
END
C ==================================================================
C Adjustable size array method
C ==================================================================
SUBROUTINE A(N,Y)
INTEGER N
REAL Y(N)
C ------------------------------------------------------------------
WRITE (*,*)
WRITE (*,*) ' Adjustable array: ', Y(1)
C ------------------------------------------------------------------
RETURN
END
C ==================================================================
C Assumed size method
C ==================================================================
SUBROUTINE B(Y)
REAL Y(*)
C ------------------------------------------------------------------
WRITE (*,*)
WRITE (*,*) ' Assumed size array: ', Y(1)
C ------------------------------------------------------------------
RETURN
END
Both routines will write the first element of the array Y.
By the way, you can't write a all of an assumed size array using just
the array name, but you can do it with adjustable arrays and of course
with arrays whose dimensions are declared with constants.
Internals of the assumed-size method
------------------------------------
In order to understand what really goes on in the assumed-size method,
let's look at the formula translating the indices of an array to memory
addresses.
To simplify the algebra let's take the case of a 2-dimensional array
with indices starting at 1 (The default). More dimensions and other
indices make the formula more complex, but doesn't change the result.
INTEGER i, j
REAL Array(M,N)
Given a reference to Array(i,j), the address in memory of that
element is calculated from the base address of Array as follows:
Memory-address = Base-address +
Element-size * (i - 1) +
Element-size * (j - 1) * M
Because FORTRAN uses the column-major scheme of storing arrays,
our formula doesn't involve N - the size of the last dimension!
Similar results are found in the general case -- the UPPER-BOUND
OF THE LAST DIMENSION is not needed to calculate the offset of a
particular element of an array. So, that upper bound can be left
unspecified (denoted as "*" in the code).
The upper-bound of the last dimension is needed only to determine
the maximum allowed value of the last index, and therefore the
size of the entire array.
An array with an unspecified final upper bound is an "assumed array",
regardless of whether it is adjustable (whether the upper or lower
bound of any dimension is a non-constant expression) or constant.
By the way, the maximal allowed number of dimensions in FORTRAN 77
is 7, a somewhat arbitrary limit, probably imposed to suit the
number of index registers in an old IBM mainframe.
Problems of assumed-size arrays
-------------------------------
The implications of the above discussion are that the compiler doesn't
have to know the size of the last dimension in order to compute memory
addresses from array indices, so it can do without it, provided we don't
ask it to do something that requires this information, e.g. perform an
I/O operation on the whole array at once, which requires knowing the
size of the array.
The responsibility for keeping inside the bounds of the last dimension
of an assumed-size array is on the programmer, e.g. the bounds-checking
compiler option wouldn't help you in this case and check out-of-bounds
references for you.
Assumed-size arrays also can be a pain for specialized Fortran
implementations to handle, perhaps even impossible. For example,
a compiler for a dual-machine configuration, where the boundary between
machines is normally the procedure, and arguments between procedures
are therefore copied as part of the call, must therefore know how much
information to copy for each argument, but cannot reliably determine
this for an assumed-size array.
A certain compiler for a two-processor machine that took this approach
simply skipped any such procedure, printed a diagnostic, and this meant
any procedure with an assumed-size array could not be called by code
that "lived" on the "other" processor.
Assumed size arrays and character strings
-----------------------------------------
Passing a character variable (a string) to a procedure with the '*(*)'
syntax may seem at a first glance like using the assumed-size method,
but it is not, because, in this case, all code that calls the procedure
is compiled to pass the actual length of any assumed-length string,
so the length is known to the procedure at run time.
PROGRAM TEST
CHARACTER string*80
......................................
CALL SUB1(string)
......................................
END
SUBROUTINE SUB1(string)
CHARACTER string*(*)
............................
RETURN
END
In this example 'string' is NOT passed by the assumed-size method,
and the I/O restriction on assumed-size arrays doesn't apply to it.
You _can_ declare a dummy CHARACTER variable using the CHARACTER*(*)
syntax and then use it in I/O -- because the length is passed by the
caller at run time.
You can't use an assumed _array_ in I/O, so you can't, for example,
do "WRITE (10) CHA" given "CHARACTER*1 CHA(*)", even though, to some
people, that looks the same as if "CHARACTER*(*) CHA" was specified,
in which case the WRITE would be valid.
Internal mechanisms for passing the length of strings
-----------------------------------------------------
There are 4 methods (so it is said):
1) An additional variable, hidden from the programmer
after the end of the argument list. This is the
method used by most UNIX f77 compilers.
2) Using a descriptor - a special data structure that
holds the size of the string, the address it starts,
and maybe some "identifying signature".
For example, the Fixed-Length Descriptor used by DEC FORTRAN on VMS
is one of a large family of similar descriptors, it has the following
structure (each "cell" is a byte long):
+------+------+ +------+ +------+ +------+------+------+------+
| | | | 14 | | 1 | | | | | |
+------+------+ +------+ +------+ +------+------+------+------+
Length of Data Desc. Address of first byte of
string in type class data storage (the string)
bytes code code
(unsigned)
---?> Higher addresses
When passing a string the address of the first byte (on the left) of
the descriptor is passed, and the called routine can access its fields.
Return to contents page