General programming techniques

 1-14 GENERAL PROGRAMMING TECHNIQUES
 ***********************************
 (Thanks to Craig Burley for the excellent comments)

 Contents: 

     1) Framing 
     2) Escaping 
     3) Caching 
     4) Buffering 
     5) Logical layering (Physical/logical/virtual) 
     6) Centralized resource management 
     7) Object Oriented Programming (OOP) 
     8) Lookup tables 
     9) Callback routines 


 
 Framing methods 
 ---------------
 A common method of combining a sequence of bytes to form one unit,
 usually several such units are stringed one after the other.

 Examples of framing:

   1) Variable-size records used in FORTRAN unformatted files
   2) Packets used in TCP/IP and other communication protocols 
   3) 


 A possible structure of one such unit may be: 

    +-----------+-------+------------+-----------------+
    | Signature | Count | Data bytes | Integrity check |
    +-----------+-------+------------+-----------------+

    SIGNATURE  some byte combination that identifies the unit type 
               (by some convention agreed on between the users).

    COUNT      an integer (whose size is agreed on by convention),
               the value of COUNT is the number of data bytes to follow.
               With the COUNT field units can have variable length,
               that flexibility is important in many applications.

    DATA BYTES the 'payload' 

    CHECK      Some error checking/correcting code associated with the 
               data when the data is written, the code can be rechecked 
               whenever the unit is accessed to validate data integrity. 


 
 Escaping methods 
 ----------------
 A common problem is that you have only one communication channel, 
 but you have to pass two different types of information along it
 (e.g. data and commands), or your data coding method must be expanded 
 to include more codes. 

 Examples for such methods:

    1) Using Shift/Ctrl/Alt/Meta key modifiers on a keyboard
    2) Escaping special and control characters in C source code 
       and in UNIX shells
    3) Switching to the command mode of a modem by the '+++' 
       escape sequence (subject also to timing constraints)
    4) Some binary encoding schemes used to transfer binary 
       data through electronic mail (Quoted Printable)
    5) ANSI terminal escape sequences
    6) Escaping CPU instructions in the Intel PC CPU to the 
       mathematical co-processor
    7) Controls prefixing in the Kermit file transfer protocol (?)
    8) Hyper-Text Markup Language (HTML) start tag sign '<' ?
    9) PC keyboards transmit special combinations as a zero byte
       followed by the scan-code.

 Escaping is done by sending a special sequence that can't happen in 
 the data (at least in practice), characters following this special 
 sequence are interpreted in a different way from characters outside 
 its "scope".  

 In other words, the special 'escape' sequence 'switch on' a special
 interpretation mode (that has to be 'switched off' later).

 How you can ensure that the escape sequence will not occur in the
 "normal" data stream?  Either you choose a very improbable combination,
 or you restrict what goes on through the data stream.

 Switching off the special mode is an easier problem.


 
 Caching 
 -------
 Whenever we have available two data storage methods, it usually 
 happens that one of them will be slower and cheaper so we can buy 
 more storage, and the other faster and more costly, for example: 

   1) Local files vs. files accessed through the web
   2) Local filesystem vs. network filesystem (NFS)
   3) Local files/partitions vs. main memory (RAM)
   4) Dynamic memory vs. static memory

 Caching promotes efficiency when two conditions are met:

   1) We have "locality of reference", i.e. subsequent data 
      references are usually made to "close by" items.
   2) Reading a "block" of consecutive data is relatively quick.

 For example, caching of read operations (from a slow medium) 
 may be implemented by the following guidelines: 

   1) Whenever a read operation from the slower media is needed
      we read a data block containing the required item.
   2) The data block is kept on the faster media, if there is
      no available place, older blocks are discarded.
   3) When a data item is needed the faster storage is first
      checked to see if it contains the required item.
   4) If the required item is on the faster storage it is
      read from it and used, otherwise it is read from the
      slower media (see #1).

 Write operations (to a slow medium) may also be cached, but this 
 is a bit more complicated.

 Caching is an effective technique that is used intensively in all 
 computing systems. As caches may be shared (among processes etc),
 the time it takes to perform a task becomes non-deterministic,
 other users of the cache, doing their own work may or may not 
 discard your data, and so affect computation time.

 A high-performance Fortran programmer must take into consideration
 the size of the various caches (memory, I/O) when writing programs. 

 A simple advice is to keep often used data in arrays, instead of
 reading it again and again from a file. 


 
 Buffering 
 ---------
 Data transfers between hardware and software, or between different 
 pieces of the software, are many times "blocked" and not simply 
 "byte-by-byte", for example: 

   1) commands written at the operating system prompt, 
      are interpreted and executed only after you press
      the "return" key.  

      The terminal driver may collect the characters you 
      enter one by one, echo them to the screen (UNIX), 
      or not (VMS).  When a "return" (or a pre-defined 
      "control sequence") is sent from the keyboard, 
      the accumulated string is made available to the 
      running program.

   2) Disk controllers transfer data in "chunks" that 
      are multiples of a sector (512 bytes). 
      I/O is therefore "buffered", sometimes few times 
      on different levels to satisfy this condition.

 Buffering is done utilizing an array (the buffer), large enough to 
 hold a chunk of data of the required size, and a few procedures that
 are used to put and get data from the buffer. 

 A sketchy implementation of a buffer that accumulates output data 
 into sector size chunks, then "flushes" them to disk is: 

 ........................



 
 Logical Layering 
 ----------------
 Well, the best example would be Fortran itself.

 People are working on a lot of different machines, of course 'real 
 programmers' code directly in machine language, but we poor 'virtual 
 programmers' need a simpler and portable interface to our computer.

 The solution is to define a 'high-level language' e.g. Fortran/C/...
 and have on each machine a program (compiler) that translates programs
 written in Fortran/C/... to machine language. At the price of some 
 CPU time we get a portable and easy to use interface to every computer 
 (if someone wrote a compiler for it).

 This reminds one of the Esperanto language, that invented to enhance 
 communication between people of different nations. The inventor of
 Esperanto thought that people should learn one artificial language 
 (designed to be simple as possible) instead of learning all common
 languages. 

 Examples for 'logical layering' are innumerable, they can be found
 in the following interfaces: 

   1) Interactive user / Operating system / Different kinds of terminals 
   2) User program / Operating system / Different disks and tape drives
   3) File names / File system / Data blocks on disks and tape drives
   4) Graphic package / Different graphic devices 
   5) Word-processor / Different printers 
   6) In a PC:  DOS functions / ROM-BIOS interrupts
   7) In TCP/IP: DNS names/IP addresses/Ethernet hardware addresses 

 The 'logical interface' is implemented by a 'translator' that can
 translate between the two languages spoken on the two sides of the 
 interface. 

 The physical/logical/virtual distinction .........................



 
 Centralized resource management 
 -------------------------------
 When resources such as the pool of logical unit numbers, access to 
 some data file, etc are shared by several routines or processes, the 
 natural way to eliminate access collisions is to have a centralized 
 management. 

 The resource management can be ....................................



 
 Object-Oriented Programming 
 ---------------------------
 Like other good things (Ethernet, windowing systems), OOP was invented 
 in Xerox a long time ago. 

 The idea behind OOP is (approximately) that sometimes it is more natural 
 to partition a program to LOGICAL ENTITIES communicating by passing 
 messages instead of FUNCTIONAL UNITS as in the modular paradigm.

 A good example is a modern visualization program (e.g. SGI Explorer,
 or the popular AVS), in which you choose the required data processing 
 modules from a menu, connect them together with some mouse clicks, 
 and then tell the input module the name of your data file, and the 
 whole thing starts producing pictures.

 The natural conceptual model for this kind of magic is to view the 
 program as a set of logical entities: the input reader module that 
 reads the 3-dimensional data, and passes it to a module that may 
 take a 2D section, then the graphic module translates the floating-
 point numbers into a color picture where the colors correspond to 
 the data values. 

 The logical entities used in OOP are a sweeping generalization of 
 the data structure called 'record' in Pascal and 'structure' in C: 
 an ordered collection of simpler data structures, not necessarily 
 of the same type, that can be accessed individually by a name that 
 consists of two parts, the structure name and the name of the 'field'.

 The required generalization is made by allowing a "record" to also 
 include procedures (usually called 'member functions') performing 
 functions specific to the record type, for example: 

   1) Allocating memory for the internal variables 
      and initializing the variables (constructor) 
   2) Maintaining the variables's values according 
      to a pre-defined algorithm coded into the
      'member procedures'.
   3) Deallocating the memory storage when the 
      "record" is no longer necessary (destructor)

 The new data structure is suitable for implementing the abstract 
 concept of an 'object'.  In short, objects are data structures with 
 associated procedures, which are used to maintain the data structure 
 values in a consistent way. 

 Object-oriented programming can be directly expressed with suitable
 programming languages (e.g. Objective C, C++) that support 'objects'. 
 Fortran 90 supports a few OOP concepts, and the rest can be done 
 "by hand" if the programmer is disciplined enough.  Fortran 2000
 is expected to provide full OOP support.



 
 Lookup tables 
 -------------
 Lookup tables are a simple programming trick to speed a special kind 
 of calculations. 

 If you have to compute again and again some function or expression 
 and the possible values the argument(s) can take are relatively few 
 you can do the computing once and put the results in an array.
 To retrieve results you just reference the array.



 
 Callback routines 
 -----------------
Return to contents page