It is known to provide a tape drive having data compression capability (a DC drive) so that, as data arrives from a host, it is compressed before being written to tape thus increasing the tape storage capacity. DC drives are also able to read compressed data from tape and to decompress the data before sending it to a host. It is also possible for a host to perform software compression and/or decompression of user data.
There is more than one type of data compression. For example, removing separation marks (e.g. designating records, files etc.) from the datastream and storing information regarding the positions of these marks in an index effectively compresses the user data. Another, quite different approach, is to compress user data by removing redundancy in the data (e.g. by replacing user data words with codewords or symbols from which the original data can be recovered). It is the latter type which is being referred to in this specification when the words "data compression" or abbreviation DC is used.
Several different algorithms are known for compressing data. One approach is to convert the user data to code words using a dictionary which is created dynamically as the data is compressed. The dictionary is recreated, again dynamically, during decompression. An algorithm which adopts this approach is the LEMPEL ZIV WELCH algorithm or the LZW algorithm.
During data compression, a DC drive operating according to the Lempel Ziv Welch (LZW) algorithm inserts a RESET codeword into the datastream indicative of when a new dictionary is started. A FLUSH codeword is inserted when data is to be flushed (i.e. the small amount of data held in a buffer awaiting compression is passed through before further incoming data is sent to the buffer).
Using the LZW algorithm, to achieve decompression of part of the compressed data on a tape, it is necessary to begin decompressing from a RESET code word in order to be able to recreate the relevant dictionary. Normally, a FLUSH operation is performed prior to beginning a new dictionary so that the new dictionary can start at a convenient point in the data (e.g. at the beginning of the record).
Another approach to data compression is to reference a chosen amount of the most recent uncompressed datastream (termed a `history buffer` or `sliding window` or `sliding dictionary`) and to replace items in the incoming datastream which appear in the history buffer with codewords/tokens indicating where in the history buffer they are located. This approach is known as the first Lempel Ziv algorithm or LZ1. During decompression, a history buffer is also referenced and as codewords/tokens are encountered the relevant strings from the history buffer are replaced to reconstruct the original datastream.
In this approach, a RESET command has the effect of clearing the history buffer and a FLUSH command has the effect of clearing the lookahead buffer.
A flush operation can therefore generally be thought of as passing through a relatively small amount of raw data or completing a compression operation on data which is awaiting compression before recommencing data compression at a convenient point in the data stream. It is applicable whether compression is being performed by software or hardware.