With increasing demand for faster, more powerful and more efficient ways to store information, optimization of storage technologies is becoming a key challenge. Logical data objects (data files, image files, data blocks, etc.) may be compressed for transmission and/or storage. Data compression techniques are used to reduce the amount of data to be stored and/or transmitted in order to reduce the storage capacity and/or transmission time respectively. Compression may be achieved by using different compression algorithms known in the art, for example, by sequential data compression which takes a stream of data as an input and generates a usually shorter stream of output from which the original data can be restored (e.g. Lempel-Ziv type algorithms, run length encoding algorithms, arithmetic coding type algorithms, etc.). By way of non-limiting example, Lempel-Ziv type sequential algorithms compress strings of binary data of variable length into a fixed length compressed binary format. Lempel-Ziv type algorithms may be implemented using a history buffer that contains the most recent bytes or words of a file in the correct sequence. By repeated executions of a basic routine, new bytes are read as long as the sequence of incoming bytes is matched by a sequence in the history buffer.
The problems of effectively implementing sequential data compression have been recognized in the Prior Art and various systems have been developed to provide a solution, as for example:
U.S. Pat. No. 4,558,302 (Welch) discloses a compression method (commonly called a Lempel-Ziv-Welch data compression technique) utilizing a dictionary for storing commonly used data strings and searching that dictionary using hashing techniques.
U.S. Pat. No. 4,586,027 (Tsukiyama et al.) discloses a method of data compression and restoration wherein an input data string including repetitive data more in number than the specified value is transformed into a data string having a format including the first region where non-compressed data are placed, the second region including a datum representative of a data string section which has undergone the compression process and information indicative of the number of repetitive data, i.e., the length of the data string section, and control information inserted at the front and back of the first region indicative of the number of data included in the first region, said transformed data string being recorded on the recording medium, and, for data reproduction, the first and second regions are identified on the basis of the control information read out on the recording medium so that the compressed data string section is transformed back to the original data string in the form of repetitive data.
U.S. Pat. No. 4,560,976 (Finn) discloses a compression method wherein a stream of source characters with varying relative frequencies is encoded into a compressed stream of codewords, each having one, two or three subwords. The method comprises ranking the source characters by their current frequency of appearance, encoding the source characters having ranks no higher than a first number as one sub-word codewords, source characters having ranks higher than the first number but no higher than a second number as two sub-word codewords, and the remaining source characters as three sub-word codewords. The first number is changed and the second number is recalculated as required by the changing frequencies of the source characters to minimize the length of the stream of codewords.
U.S. Pat. No. 4,701,745 (Waterworth) discloses a data compression system including an input store for receiving and storing a plurality of bytes of data from an outside source. Data processing means for processing successive bytes of data from the input store include circuit means operable to check whether a sequence of bytes is identical with a sequence of bytes already processed, output means operable to apply to a transfer medium each byte of data not forming part of such an identical sequence, and an encoder responsive to the identification of such a sequence to apply to the transfer means an identification signal which identifies both the location in the input store of the previous occurrence of the sequence of bytes and the number of bytes in the sequence.
U.S. Pat. No. 4,876,541 (Storer) discloses a data compression system for encoding and decoding textual data, including an encoder for encoding the data and a decoder for decoding the encoded data. Both encoder and decoder have dictionaries for storing frequently-appearing strings of characters. Each string is identified by a unique pointer. The input data stream is parsed and matched with strings in the encoder dictionary using a novel matching algorithm. The pointer associated with the matched string is then transmitted to a remote location for storage or decoding. Thereafter, the encoder dictionary is updated to include new strings of data based on the matched string of data. The strings of data may be arranged using a modified least recently used queue. The decoder matches each unique pointer in the stream of compressed input data with a corresponding pointer in the decoder dictionary. The decoder then transmits the string of character data associated with the matched pointer, thereby providing textual data in original, uncompressed form. Thereafter, using the novel update and deletion algorithms, new strings of data are added to, and old strings of data are deleted from, the decoder dictionary, so as to ensure both encoder and decoder dictionaries contain identical strings of data.
U.S. Pat. No. 5,384,567 (Hassner et al.) discloses an apparatus and method for executing a sequential data compression algorithm suitable for use where data compression is required in a device (as distinguished from host) controller. A history buffer compresses an array of i identical horizontal slice units. Each slice unit stores j symbols to define j separate blocks in which the symbols in each slice unit are separated by exactly i symbols. Symbols in a string of i incoming symbols are compared by i comparators in parallel with symbols previously stored in the slice units to identify matching sequences of symbols. A control unit controls execution of the sequential algorithm to condition the comparators to scan symbols in parallel but in each of the blocks sequentially and cause matching sequences and non-matching sequences of symbols to be stored in the array. The parameters i and j are selected to limit the number of comparators required to achieve a desired degree of efficiency in executing the algorithm based upon a trade-off of algorithm execution speed versus hardware cost. A priority encoder calculates from signals output by the slice units each j,i address in which a matching sequence is identified, but it outputs the address of only one (such as the smallest) of these addresses.
U.S. Pat. No. 5,627,534 (Craft) discloses a dual stage data lossless compressor for optimally compressing bit mapped imaged data. The first stage run length compresses data bits representing pixel positions along a scan line of a video image to data units of fixed length. The units alternate to represent runs of alternate video image data values. The run length compressed data units are subject to second stage compression using a sliding window Lempel-Ziv compressor. The output from the Lempel-Ziv compressor includes raw tokens of fixed length and compressed tokens of varying lengths. The combination of a run length precompressor and a sliding window Lempel-Ziv post compressor, in which the run length compressor output is a succession of data units of fixed length, provides an optimum match between the capabilities and idiosyncracies of the two compressors, and related decompressors, when processing business form data images. Furthermore, the asymmetric simplicity of Lempel-Ziv sliding window decompression and run length decompression simplicity leads to a decompression speed compatible with contemporary applications.
U.S. Pat. No. 5,652,878 (Craft) discloses a data compression apparatus and method for implementation of LZ algorithms in parallel hardware architecture. The apparatus includes a circuit for receiving a data element, a storage circuit for sequentially storing previously received data elements at sequentially addressed fixed locations, a circuit for comparing the received data element to the stored data elements to determine whether the received data element matches at least one of the stored data elements, and a circuit for generating an address of the matching stored data element.