5-8 DATA COMPRESSION
*********************
Data compression is a size reducing reversible transformation of data,
used to save disk space, of course before the data is used it must be
decompressed.
The best compression method is the Lempel-Ziv-Welch (LZW) algorithm
that is used by most compression software: UNIX/compress,
LZW compression is not simple, nice FORTRAN implementations can be
found in the FTP archive of Arne Vajhoej:
ftp://ftp.hhs.dk/ftn/lzw.zip (Plus a little VAX assembly)
ftp://ftp.hhs.dk/ftn/splzw.zip (DOS/Salford FORTRAN 77)
RFC 959 (FTP) compression method
--------------------------------
The following method is from Postel & Reynolds RFC 959 October 1985,
the specification of the File Transfer Protocol (FTP).
3.4.3. COMPRESSED MODE
There are three kinds of information to be sent: regular data,
sent in a byte string; compressed data, consisting of
replications or filler; and control information, sent in a
two-byte escape sequence. If n>0 bytes (up to 127) of regular
data are sent, these n bytes are preceded by a byte with the
left-most bit set to 0 and the right-most 7 bits containing the
number n.
Byte string:
1 7 8 8
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|0| n | | d(1) | ... | d(n) |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
^ ^
|---n bytes---|
of data
String of n data bytes d(1),..., d(n)
Count n must be positive.
To compress a string of n replications of the data byte d, the
following 2 bytes are sent:
Replicated Byte:
2 6 8
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|1 0| n | | d |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
A string of n filler bytes can be compressed into a single
byte, where the filler byte varies with the representation
type. If the type is ASCII or EBCDIC the filler byte is
(Space, ASCII code 32, EBCDIC code 64). If the type is Image
or Local byte the filler is a zero byte.
Filler String:
2 6
+-+-+-+-+-+-+-+-+
|1 1| n |
+-+-+-+-+-+-+-+-+
The escape sequence is a double byte, the first of which is the
escape byte (all zeros) and the second of which contains
descriptor codes as defined in Block mode. The descriptor
codes have the same meaning as in Block mode and apply to the
succeeding string of bytes.
Compressed mode is useful for obtaining increased bandwidth on
very large network transmissions at a little extra CPU cost.
It can be most effectively used to reduce the size of printer
files such as those generated by RJE hosts.
Return to contents page