The basic Ziv-Lempel encoder has a dictionary, in which each entry has an associated index number. Initially the dictionary contains only the basic alphabet of the source. During the encoding process, new dictionary entries are formed by appending single symbols to existing entries. The dictionary may be considered to be in the form of a search tree of connected symbols of the source alphabet. Nodes in the tree correspond to specific sequences of symbols which begin at the root of the tree and the data is compressed by recognising strings of symbols in the uncompressed input data which correspond to nodes in the tree, and transmitting the index of the memory location corresponding to the matched node. A corresponding search tree is provided in the decoder which receives the index representing the compressed data and the reverse process is performed by the decoder to recover the compressed data in its original form. The search tree of the encoder gradually grows during the encoding process as further strings of symbols are identified in the input data and in order to enable the decoder to decode the compressed data, its search tree must be updated to correspond with the search tree of the encoder.
The Ziv-Lempel algorithm has been found difficult to implement in practice, since it requires an indefinitely large memory to store the search tree in its basic form. The use of data structures such as the "trie" structure disclosed by Sussenguth (ACM Sort Symposium 1962) can however greatly improve the storage efficiency and search time associated with text strings. EPA127,815 (Miller and Wegman) and EPA129439 (Welch) disclose similar implementations of the Ziv Lempel algorithm based on the use of a trie structure.
In EPA127,815 (Miller and Wegman) improvements are described to the Ziv-Lempel algorithm which enhance the memory efficiency and speed up the encoding process. The dictionary is held in the form of a tree, with each node containing a single character and a pointer to the parent node which represents the prefix string. A hash table is used to determine, given a matched sub-string and the next input character, whether the extended sub-string is in the dictionary. However, the hash table requires a significant amount of memory and processing time in addition to that needed for the storage of the basic tree structure used to encode the dictionary.
EPA129,439 (Welch) discloses a high speed data compression and decompression apparatus and method in which strings of symbols in the input message are recognised and stored. Strings are entered into a string table and are searched for in the string table by means of a hashing function which utilises a hash key comprising a prior code signal and an extension character to provide a set of N hash table addresses where N is typically 1 to 4. The N RAM locations are sequentially searched and if the item is not in the N locations, it is considered not to be in the table. This procedure is stated to reduce compression efficiency but to simplify substantially the implementation.
U.S. Pat. No. 4,612,532 (Bacon et al) discloses a system, not based on the Ziv-Lempel algorithm, for the dynamic encoding of a stream of characters in which each character is associated with a "follow set" table of characters which usually follow it, in order of the frequency with which they occur. These tables are of a pre-determined length and therefore the degree of branching of the tree is inevitably restricted.
U.S. Pat. No. 4,464,650 (Eastman et al) discloses a method based on the Ziv-Lempel algorithm of compressing input data comprising feeding successive symbols of the data into a processor provided with a memory, generating from strings of symbols in the input data a dictionary in the form of a search tree of symbols in the memory which has paths representative of said strings, matching symbol strings in the input data with previously stored paths in the search tree and generating from the stored paths compressed output data corresponding to the input data. However the data structure utilised for the circuitry is highly complex and furthermore a hashing function is required.
A particular problem inherent in the implementation of an encoder of the type described occurs when the search tree grows to the limit of the memory space available. It is necessary to reduce the size of (i.e. "prune") the search tree in order to recover memory space for the storage of new strings. A number of well known methods exist for performing this type of function, which are reviewed in Computer Architecture and Parallel Processing (Hwang and Briggs, McGraw Hill 1985). The commonly used techniques are LRU--Least Recently Used, applied to the Ziv Lempel algorithm by Miller and Wegman (EPA127815), LFU--Least Frequently Used, applied to a similar string encoding algorithm by Mayne and James (Information Compression by Factorising Common strings, Computer Journal, 18,2 pp 157-160, 1975), FIFO--First In First Out, LIFO--Last In First Out, the CLOCK algorithm, and Random replacement, the last four techniques cited have not been applied to the Ziv Lempel algorithm. In addition, it is known to reset the search tree back to the initial state which bears a penalty in terms of compression performance, and also to cease adding new strings when the memory capacity is exhausted which will give poor performance if the characteristics of the data change.