The present invention relates to data compression and decompression apparatus and more particulary to an improved apparatus employing common circuitry for compression and decompression, the apparatus being capable of decompressing compressed data recorded in one direction when the compressed data is read in the opposite direction, and further including a throttle control for preventing data underruns.
Eastman et al. U.S. Pat. No. 4,464,650 discloses a data compression and decompression apparatus which parses a stream of input data symbols into adaptively growing sequences of symbols. This apparatus requires numerous memory access cycles for each input character and utilizes time-consuming multiplication and division procedures in order to accomplish the compression and decompression. In addition, the disclosed device requires separate and distinct devices for performing the compression and decompression.
Welch U.S. Pat. No. 4,558,302 discloses a data compression and decompression apparatus wherein a data compressor compresses an input stream of data characters by storing in a string table strings of data character signals encountered in the input stream. A separate decompressor is provided for decompressing the compressed data. The compressor searches the input data stream to determine the longest match to a stored string. Each stored string comprises a prefix string and an extension character where the extension character is the last character in the string and the prefix string comprises all but the extension character. Each string has a code word associated therewith and a string is stored in the string table, at least implicity, by storing the code word for the string, the code word for the string prefix and the extension character. When the longest match between the input data character stream and the stored strings is determined, the code word for the longest match is transmitted as the compressed code signal for the encountered string of characters and an extended string is stored in the string table. The prefix of the extended string is the longest match and the extension character of the extended string is the next input data character following the longest match. Searching through the string table and entering extended strings therein is effected by a limited search hashing procedure. Decompression is effected by a decompressor that receives the compressed code signals and generates a string table similar to that constructed by the compressor to effect lookup of received code signals so as to recover the data character signals comprising a stored string. The decompressor string table is updated by storing a string having a prefix in accordance with a prior received code signal and an extension character in accordance with the first character of the currently recovered string.
While the apparatus disclosed in the Welch patent is admirably suited for its purpose, it lacks several desirable features. It cannot decompress data which is read in the direction reverse to that in which it was recorded. It is subject to data over-run or data under-run conditions. For example, during compression of highly compressible data the data rate may be reduced sufficiently to cause data under-run at the tape control unit or other output device. On the other hand, when highly compressed data is decompressed and transferred through low speed channels it may cause data over-run.
A further disadvantage of the Welch device is that, relatively speaking, considerable time is lost clearing the string table. The string table becomes "tired" in that the accumulation of strings stored therein, after an interval of time, may not be the strings most likely to appear in newer incoming data. Thus, it is necessary to clear or refresh the string table and start a new set of strings. However, the addressing and clearing of each location in the string table takes considerable time during which no compression may take place. In accordance with one aspect of the present invention, actual clearing of the string table does not take place. Instead, a sub-block counter is provided and its contents are written into the string table with each prefix code. The counter is incremented after a predetermined number of string codes have been written into the string table. Each time a location in the string table is accessed to see if it is empty, the sub-block count stored in the location is compared with the count in the counter. If the two are not equal then the location is treated as being empty.
Start-up time for a write-to-tape operation creates a special case which might result in data underrun. To achieve the high compressions possible with the algorithm disclosed in the Welch patent it is necessary that data buffering and compression proceed for some time prior to starting the tape on a write command. However, this precompression has an impact on cost, performance and the resulting compression unless the start time is accurately controlled For example, if the tape is started too soon then it may be necessary to issue throttle characters to prevent tape under-run. This will result in lower overall compression. On the other hand, if the tape is started too late then the tape will have to be run after the compressor has completed its compression in order to write buffered compressed data. This increases the overall time required for the operation. Ideally, the tape should be started at a time such that it will be ready to write the last of the compressed data as that data is produced by the compressor.