1-14 GENERAL PROGRAMMING TECHNIQUES
***********************************
(Thanks to Craig Burley for the excellent comments)
Contents:
1) Framing
2) Escaping
3) Caching
4) Buffering
5) Logical layering (Physical/logical/virtual)
6) Centralized resource management
7) Object Oriented Programming (OOP)
8) Lookup tables
9) Callback routines
Framing methods
---------------
A common method of combining a sequence of bytes to form one unit,
usually several such units are stringed one after the other.
Examples of framing:
1) Variable-size records used in FORTRAN unformatted files
2) Packets used in TCP/IP and other communication protocols
3)
A possible structure of one such unit may be:
+-----------+-------+------------+-----------------+
| Signature | Count | Data bytes | Integrity check |
+-----------+-------+------------+-----------------+
SIGNATURE some byte combination that identifies the unit type
(by some convention agreed on between the users).
COUNT an integer (whose size is agreed on by convention),
the value of COUNT is the number of data bytes to follow.
With the COUNT field units can have variable length,
that flexibility is important in many applications.
DATA BYTES the 'payload'
CHECK Some error checking/correcting code associated with the
data when the data is written, the code can be rechecked
whenever the unit is accessed to validate data integrity.
Escaping methods
----------------
A common problem is that you have only one communication channel,
but you have to pass two different types of information along it
(e.g. data and commands), or your data coding method must be expanded
to include more codes.
Examples for such methods:
1) Using Shift/Ctrl/Alt/Meta key modifiers on a keyboard
2) Escaping special and control characters in C source code
and in UNIX shells
3) Switching to the command mode of a modem by the '+++'
escape sequence (subject also to timing constraints)
4) Some binary encoding schemes used to transfer binary
data through electronic mail (Quoted Printable)
5) ANSI terminal escape sequences
6) Escaping CPU instructions in the Intel PC CPU to the
mathematical co-processor
7) Controls prefixing in the Kermit file transfer protocol (?)
8) Hyper-Text Markup Language (HTML) start tag sign '<' ?
9) PC keyboards transmit special combinations as a zero byte
followed by the scan-code.
Escaping is done by sending a special sequence that can't happen in
the data (at least in practice), characters following this special
sequence are interpreted in a different way from characters outside
its "scope".
In other words, the special 'escape' sequence 'switch on' a special
interpretation mode (that has to be 'switched off' later).
How you can ensure that the escape sequence will not occur in the
"normal" data stream? Either you choose a very improbable combination,
or you restrict what goes on through the data stream.
Switching off the special mode is an easier problem.
Caching
-------
Whenever we have available two data storage methods, it usually
happens that one of them will be slower and cheaper so we can buy
more storage, and the other faster and more costly, for example:
1) Local files vs. files accessed through the web
2) Local filesystem vs. network filesystem (NFS)
3) Local files/partitions vs. main memory (RAM)
4) Dynamic memory vs. static memory
Caching promotes efficiency when two conditions are met:
1) We have "locality of reference", i.e. subsequent data
references are usually made to "close by" items.
2) Reading a "block" of consecutive data is relatively quick.
For example, caching of read operations (from a slow medium)
may be implemented by the following guidelines:
1) Whenever a read operation from the slower media is needed
we read a data block containing the required item.
2) The data block is kept on the faster media, if there is
no available place, older blocks are discarded.
3) When a data item is needed the faster storage is first
checked to see if it contains the required item.
4) If the required item is on the faster storage it is
read from it and used, otherwise it is read from the
slower media (see #1).
Write operations (to a slow medium) may also be cached, but this
is a bit more complicated.
Caching is an effective technique that is used intensively in all
computing systems. As caches may be shared (among processes etc),
the time it takes to perform a task becomes non-deterministic,
other users of the cache, doing their own work may or may not
discard your data, and so affect computation time.
A high-performance Fortran programmer must take into consideration
the size of the various caches (memory, I/O) when writing programs.
A simple advice is to keep often used data in arrays, instead of
reading it again and again from a file.
Buffering
---------
Data transfers between hardware and software, or between different
pieces of the software, are many times "blocked" and not simply
"byte-by-byte", for example:
1) commands written at the operating system prompt,
are interpreted and executed only after you press
the "return" key.
The terminal driver may collect the characters you
enter one by one, echo them to the screen (UNIX),
or not (VMS). When a "return" (or a pre-defined
"control sequence") is sent from the keyboard,
the accumulated string is made available to the
running program.
2) Disk controllers transfer data in "chunks" that
are multiples of a sector (512 bytes).
I/O is therefore "buffered", sometimes few times
on different levels to satisfy this condition.
Buffering is done utilizing an array (the buffer), large enough to
hold a chunk of data of the required size, and a few procedures that
are used to put and get data from the buffer.
A sketchy implementation of a buffer that accumulates output data
into sector size chunks, then "flushes" them to disk is:
........................
Logical Layering
----------------
Well, the best example would be Fortran itself.
People are working on a lot of different machines, of course 'real
programmers' code directly in machine language, but we poor 'virtual
programmers' need a simpler and portable interface to our computer.
The solution is to define a 'high-level language' e.g. Fortran/C/...
and have on each machine a program (compiler) that translates programs
written in Fortran/C/... to machine language. At the price of some
CPU time we get a portable and easy to use interface to every computer
(if someone wrote a compiler for it).
This reminds one of the Esperanto language, that invented to enhance
communication between people of different nations. The inventor of
Esperanto thought that people should learn one artificial language
(designed to be simple as possible) instead of learning all common
languages.
Examples for 'logical layering' are innumerable, they can be found
in the following interfaces:
1) Interactive user / Operating system / Different kinds of terminals
2) User program / Operating system / Different disks and tape drives
3) File names / File system / Data blocks on disks and tape drives
4) Graphic package / Different graphic devices
5) Word-processor / Different printers
6) In a PC: DOS functions / ROM-BIOS interrupts
7) In TCP/IP: DNS names/IP addresses/Ethernet hardware addresses
The 'logical interface' is implemented by a 'translator' that can
translate between the two languages spoken on the two sides of the
interface.
The physical/logical/virtual distinction .........................
Centralized resource management
-------------------------------
When resources such as the pool of logical unit numbers, access to
some data file, etc are shared by several routines or processes, the
natural way to eliminate access collisions is to have a centralized
management.
The resource management can be ....................................
Object-Oriented Programming
---------------------------
Like other good things (Ethernet, windowing systems), OOP was invented
in Xerox a long time ago.
The idea behind OOP is (approximately) that sometimes it is more natural
to partition a program to LOGICAL ENTITIES communicating by passing
messages instead of FUNCTIONAL UNITS as in the modular paradigm.
A good example is a modern visualization program (e.g. SGI Explorer,
or the popular AVS), in which you choose the required data processing
modules from a menu, connect them together with some mouse clicks,
and then tell the input module the name of your data file, and the
whole thing starts producing pictures.
The natural conceptual model for this kind of magic is to view the
program as a set of logical entities: the input reader module that
reads the 3-dimensional data, and passes it to a module that may
take a 2D section, then the graphic module translates the floating-
point numbers into a color picture where the colors correspond to
the data values.
The logical entities used in OOP are a sweeping generalization of
the data structure called 'record' in Pascal and 'structure' in C:
an ordered collection of simpler data structures, not necessarily
of the same type, that can be accessed individually by a name that
consists of two parts, the structure name and the name of the 'field'.
The required generalization is made by allowing a "record" to also
include procedures (usually called 'member functions') performing
functions specific to the record type, for example:
1) Allocating memory for the internal variables
and initializing the variables (constructor)
2) Maintaining the variables's values according
to a pre-defined algorithm coded into the
'member procedures'.
3) Deallocating the memory storage when the
"record" is no longer necessary (destructor)
The new data structure is suitable for implementing the abstract
concept of an 'object'. In short, objects are data structures with
associated procedures, which are used to maintain the data structure
values in a consistent way.
Object-oriented programming can be directly expressed with suitable
programming languages (e.g. Objective C, C++) that support 'objects'.
Fortran 90 supports a few OOP concepts, and the rest can be done
"by hand" if the programmer is disciplined enough. Fortran 2000
is expected to provide full OOP support.
Lookup tables
-------------
Lookup tables are a simple programming trick to speed a special kind
of calculations.
If you have to compute again and again some function or expression
and the possible values the argument(s) can take are relatively few
you can do the computing once and put the results in an array.
To retrieve results you just reference the array.
Callback routines
-----------------
Return to contents page