3-10 PERFORMANCE MEASUREMENT
****************************
(The CPU time measurement techniques on VMS are due to Arne Vajhoej)
Several test suits are used to compare the performance of computers
by measuring the CPU time used in running a standard set of subprograms,
and performing some kind of averaging.
Remember that such tests really measure the combined performance of
the hardware (CPU, memory, I/O devices) and the software (Operating
System, compiler used, compiler switches used), so any change in the
configuration will change the results.
Hardware and software changes all the time, so results should have
attached a description of the hardware and software used in the test.
Even a different version of the compiler may optimize differently and
change the results.
The most popular test suit is probably Specfp92 (or Specfp95).
We will discuss only a special case, measuring the performance of your
own program.
Wall-clock time vs. CPU time
----------------------------
Wall-clock time is the time elapsed between the invocation and
termination of your program, you can measure it with an ordinary
clock.
CPU time is the combined time you had access to the CPU during the
run, if there is more than one CPU, the CPU time for each is added.
Operating systems usually record CPU usage information and like to
do bookkeeping, so you can ask your OS and get the CPU time.
Of course CPU time on a single CPU machine will be less than wall-clock
time, the CPU is shared with various system processes and possibly
other users.
Measuring performance
---------------------
The CPU time needed to run a program (concurrently or not) varies
'randomly' when other programs run, because the 'workload' they cause
is usually inconstant. Possible mechanisms responsible for that
phenomenon are:
1) Cache contamination by the other processes running between
the CPU time-slices your program get, forces your program
to use the much slower main memory.
2) When there are more processes sharing 'physical' memory,
more PAGING is needed, paging takes CPU time that may be
charged to your program (VMS).
Because of the uncontrollable variations in the workload, we must
adopt a statistical approach, and compute the average of many
time measurements.
Recommended tests
-----------------
1) Measure the relative performance of REAL*4 and REAL*8 on your
machine, you will probably get similar results, try also REAL*16
if you have support for it.
2) Measure the performance degradation when forcing array bounds
checking with a suitable compiler switch (see chapter on
compiler switches).
3) Measure how performance change when you vary the compiler
'optimization level'.
Measuring CPU time on VMS
=========================
Use one of the "clock" functions from inside your program, that
way you will avoid the overhead associated with image startup and
image rundown.
These functions returns the elapsed CPU time used by the process
calling them and its subprocesses, or CPU time elapsed since a
"timer" was started.
"Clock functions" on VMS
------------------------
There are many "clock functions" on VMS:
1) The triad LIB$INIT_TIMER/LIB$STAT_TIMER/LIB$FREE_TIMER is
supposed to be the "official clock function".
The triad should be called in the sequence specified above.
LIB$INIT_TIMER allocates some memory you should free with
LIB$FREE_TIMER.
A good program should establish a control/y handler to
deallocate that memory on user interrupt.
2) LIB$GETJPI, the simplified interface to SYS$GETJPI, is the
recommended "clock function" (see example below). The time
unit it uses is 10 milliseconds.
2) SYS$GETJPI, the system service all "clock functions" use.
The following are unsupported and undocumented:
1) PAS$CLOCK2, a Pascal "clock function" found on every VMS.
2) PAS$CLOCK, another Pascal "clock function".
3) DECC$CLOCK, DECC "clock function".
4) CLOCK, VAXC "clock function", link against VAXCRTL.
An example program:
PROGRAM CPUCLK
C ------------------------------------------------------------------
INCLUDE '($JPIDEF)'
C ------------------------------------------------------------------
INTEGER
* TIME1, TIME2,
* I
C ------------------------------------------------------------------
REAL*4
* TMP4
C ------------------------------------------------------------------
REAL*8
* TMP8
C ------------------------------------------------------------------
REAL*16
* TMP16
C ------------------------------------------------------------------
CALL LIB$GETJPI(JPI$_CPUTIM,,,TIME1)
C ------------------------------------------------------------------
TMP4 = 0.0
DO I = 1, 1000000
TMP4 = TMP4 + EXP(-REAL(I))
ENDDO
WRITE(*,*) ' Sum of loop (REAL*4): ', TMP4
C ------------------------------------------------------------------
CALL LIB$GETJPI(JPI$_CPUTIM,,,TIME2)
WRITE(*,*) ' Time of loop (REAL*4): ', TIME2 - TIME1
CALL LIB$GETJPI(JPI$_CPUTIM,,,TIME1)
C ------------------------------------------------------------------
TMP8 = 0.0
DO I = 1, 1000000
TMP8 = TMP8 + EXP(-REAL(I))
ENDDO
WRITE(*,*) ' Sum of loop (REAL*8): ', TMP8
C ------------------------------------------------------------------
CALL LIB$GETJPI(JPI$_CPUTIM,,,TIME2)
WRITE(*,*) ' Time of loop (REAL*8): ', TIME2 - TIME1
CALL LIB$GETJPI(JPI$_CPUTIM,,,TIME1)
C ------------------------------------------------------------------
TMP16 = 0.0
DO I = 1, 1000000
TMP16 = TMP16 + EXP(-REAL(I))
ENDDO
WRITE(*,*) ' Sum of loop (REAL*16): ', TMP16
C ------------------------------------------------------------------
CALL LIB$GETJPI(JPI$_CPUTIM,,,TIME2)
WRITE(*,*) ' Time of loop (REAL*16): ', TIME2 - TIME1
C ------------------------------------------------------------------
END
This example program tests different size floats. The program
is highly compute bound, the only results related I/O is a
single WRITE statement (added so the compiler wouldn't optimize
away all computations).
Other useful measurements like the number of page faults (excessive
number may indicate bad memory access patterns) can be performed
by using other 'item codes' instead of JPI$_CPUTIM in the call to
LIB$GETJPI.
To get other 'item codes' names, look at the documentation of
LIB$GETJPI and SYS$GETJPI, or extract text module '$JPIDEF' from
the text library SYS$LIBRARY:FORSYSDEF.TLB by:
LIBRARY/TEXT/EXTRACT=$JPIDEF/OUTPUT=TMP.TMP SYS$LIBRARY:FORSYSDEF.TLB
Measuring performance on UNIX
=============================
The "time" command
------------------
Syntax:
time
Runs an executable file and displays information on system
resources used.
Try something simple like: time ls
The (tcsh shell) default fields (from left to right) are:
suffix description
====== ========================================================
u CPU time (seconds) spent in computing
s CPU time (seconds) spent in I/O and other activities
Elapsed time (seconds)
Total CPU time / elapsed time (may exceed 100%)
k Shared + non-shared memory used (Kbytes)
io Number of input operations + Number of output operations
pf Number of major page faults (pages brought from disk)
w Number of waits
Return to contents page