The adaptive Ziv-Lempel (AZL) process is presently used by many commercial computer systems to reduce the storage space required by data files. The AZL process is disclosed in a number of prior filed patent applications, such as in U.S. Pat. No. 4,814,746 to Miller and Wegman assigned to the same assignee as the subject application. It describes and claims a ZL compression dictionary using an LRU (least recently used) replacement technique for continuously adapting (modifying) the dictionary to new input data strings being compressed when the dictionary runs out of unused entries to accommodate new input data character strings.
The ZL data compression process compares characters in the inputted sequence to trace through respective entries in the dictionary to locate the dictionary entries representing the longest matched input strings, and to output the dictionary indices of these entries as the compressed data stream. The indices may be buffered and then transmitted in bursts of indices to maximize the efficiency of transmitting the compressed records or messages.
The AZL process modifies a dictionary for each separately inputted sequence of data characters, which may be the inputted characters in a data file, or in a computer session being transmitted. The AZL process always continues the construction and modification of its AZL dictionary WHILE compressing data. An AZL dictionary is always changed by each new character string in the input stream of data being compressed. When the AZL dictionary becomes full, one of its valid string entries is deleted and replaced by a representation of each new character string being inputted.
The compressed output of the Ziv-Lempel process is represented by a sequence of dictionary indices representing the character strings detected in the input stream. Each index is expanded to the characters in the string it represents by accessing the string entry at that index and using pointers therefrom to access entries representing the characters in the string in a reverse order. Because of LRU entry replacement in an AZL dictionary, the index representation of a particular character string may change, such as if an entry is aged out of the dictionary and later is put back in the dictionary at a different index location when the string is again encountered in the input stream.
If data compressed by the AZL process is being transmitted, its AZL dictionary is not transmitted. Instead, an identical AZL dictionary (used for expansion) is constructed from the received compressed data (indices) at the receiving location in synchronism with the construction of the identical AZL compression dictionary at the sending location. Synchronized identity of the AZL dictionary structures is essential at both the sending and receiving locations, in order to expand the received compressed data into uncompressed data identical to the original uncompressed data.
Hence, the AZL process "dynamically adapts" its dictionary by continuously updating it to each newly inputted input string in the sequence. This is why the AZL dictionary is known as a "dynamic" or "adaptive" dictionary.
The AZL process may be used to perform compression/expansion at the same location, or at two different locations connected by data transmission. Only a single dictionary is used when performing compression/expansion at the same location, but separate identical dictionaries are needed when compression is done at one location and expansion is done at a different location.
Accordingly, the AZL process generates its dictionary(s) WHILE the data is being inputted for compression, whether the AZL process is performing compression and expansion at the same location, or at two different locations connected by data transmission.
An AZL dictionary can only handle records sequentially accessed from a data base, since its structure is totally dependent on the sequential nature of its inputted records. An AZL process does not expand records randomly accessed from a data base in a different order than the records used in the current construction of its AZL dictionary.
Thus, the AZL process is totally dependent on the sequential relationship of the characters currently being inputted for compression. For example, if the same set of records are inputted in two different sequences for compression by the same Ziv-Lempel process, each sequence will generate a different dictionary, and the same uncompressed record will likely generate a compressed record containing a different set of indices in the two sequences. Thus, the AZL process has an "input-record-order dependency", due to the AZL dictionary(s) being generated DURING the inputting of uncompressed data for compression.
A computer operates inefficiently with the AZL process when its input stream is comprised of "randomly-obtained" data, such as "randomly-accessed" small records or "randomly-determined" small messages represented in a data base. AZL processing must build a new AZL dictionary for every sequence of "randomly-obtained" records or input message. This is because the building an AZL dictionary DURING the inputting of a "random-obtained" sequence of records (or messages) ties the dictionary to that particular sequence, and it cannot be used efficiently with any other "randomly-obtained" sequence of records. This prevents an AZL dictionary generated for any randomly-obtained data from being used with any other randomly-obtained data.
The computer-efficiency problem with records randomly-obtained from a data base is solved by the static Ziv-Lempel (SZL) process described and claimed in patent application Ser. No. 07/968,631 entitled "Method and Means Providing Static Dictionary Structures for Compressing Character Data and Expanding Compressed Data", filed Oct. 19, 1992 and assigned by the same assignee as the subject application.
The SZL process generates its SZL dictionary(s) BEFORE (and not while) it is used to compress or expand any record in the data base. The SZL dictionary is generated from ALL uncompressed records in a data base. The resulting compressed records may be stored to provide a compressed form of that data base which occupies only a fraction of the storage of the uncompressed data base.
The SZL process allows any individual record in the data base (whether compressed or not) to be randomly accessed and expanded independent of other records, using the SZL dictionary without change (and no computer processing is used for any dictionary generation or updating). The compressed records may be expanded at the same location, or transmitted to another location where the expansion is done.
By generating an SZL dictionary from an entire uncompressed data base PRIOR TO using the data base, the same compressed record (same indices) is obtained regardless of the order in which that record is later obtained. Hence, it may later be randomly-obtained from the data base without any affect on its compressed form, unlike in the AZL process.
Accordingly, the SZL process generates an "SZL dictionary" that represents all of the strings within the records in a data base. (The term "records" is used in a generic sense to include any recorded sequence of characters, such as may be found in messages accessed from a data base.)
The invention in application Ser. No. 07/968,631 also discovered that an SZL dictionary used for compression need not be identical to an SZL dictionary used for expansion, as long as they use the same indices to represent the same character strings. That is, the content of SZL dictionary entries may be different for corresponding entries at the same index in the separate compression and expansion dictionaries. These paired SZL compression and expansion dictionaries are herein referred to as "corresponding" dictionaries, which may or may not be identical.
Increased computer efficiency is obtained by using separate corresponding SZL compression and expansion dictionaries, rather than identical dictionaries. Corresponding SZL compression and SZL expansion dictionaries may be used at the same location, or at different locations connected by data transmission.
If the expansion process is done at a location different from the compression process, all SZL corresponding dictionaries needed at the different locations may be constructed at any location having the uncompressed data base, and then the SZL dictionaries may be transmitted to any location wherever needed. Or if an identical copy of the uncompressed data base exists at plural locations, the corresponding dictionary(s) need at the location may be constructed there.
In a data transmission environment, the SZL dictionary may be transmitted to a receiving location after the dictionary is generated from the entire uncompressed data base at the sending location and before using the SZL dictionary. An SZL dictionary can be constructed at a receiving location only if the receiving location has the same uncompressed records and inputs them to the SZL process in the exact same input sequence as is used to generate the SZL dictionary at the sending location.
A "compressed-version of the data base" may be generated simultaneously with generation of the SZL dictionary(s). The compressed version of the data base may be used to randomly or sequentially obtained any record in the data base in compressed form.
No SZL dictionary is thereafter constructed during random-accessing operations of the data base, whether uncompressed or compressed records are being randomly-obtained at a location which is to transmit the record in compressed form. That is, the prior-constructed SZL dictionary(s) can then be used for the compression and/or expansion of records or messages randomly-obtained at any location.
Furthermore, the SZL process may also be used to compress and expand new or changed records or messages in the uncompressed data base AFTER the generation of the SZL dictionary. Any new character string in such new or changed record is compressed and expanded as containing one or more existing smaller character strings currently represented in the dictionary (due to being in the prior version of the data base).
The SZL process has been found to operate efficiently with randomly-obtained small compressed records, or messages, that need to be compressed and expanded at the same location, or need to be compressed at one location and transmitted to another location where the record is expanded.
Previously, it had been presumed that poor data compression would result from the Ziv-Lempel process if the AZL process was not used to continuously adapt its dictionary to its input data. This has not been found to be the case with SZL processes, as long as the data base does not change by a large amount. Thus, it has been found that the SZL process effectively compresses records randomly accessed from a data base as long as an excessive number of changes have not been made in the data base.
Thus, the SZL process allows an existing SZL dictionary(s) to be used with any sequence of records or messages randomly obtained in a large data base without constructing or modifying any new dictionary. That is, no processing is spent on modifying the SZL dictionary structure while using the SZL process for compressing and expanding any sequence of records. On the other hand, the AZL process requires a large amount of computer processing for modifying its AZL dictionary to adapt it to each sequence of randomly-obtained records. The result is that the AZL process is not as efficient as the SZL process, when used with new sequences of randomly-obtained small records and messages.
While the SZL process is being used, old uncompressed records may be updated and new uncompressed records may be added in the corresponding uncompressed data base at a central location containing the data base. Yet, the existing SZL dictionary(s) and any corresponding dictionary(s) at this and any other location need not be updated, in order to use these dictionaries to compress and expand the new and updated records in the data base, as long as the SZL dictionary(s) is not changed.
Computer operating efficiency is further enhanced by use of a novel structure for entries in the SZL dictionary disclosed and claimed in patent application Ser. No. 07/968,631, which obtains a further significant performance increase for a computer system (over prior art AZL processes). This novel SZL entry structure enables fewer accesses in the SZL dictionary than is required in conventional AZL dictionaries--for character-string-compare determinations within the dictionary entries. Reducing the comparative number of storage accesses for dictionary entries in memory (in addition to not having to spend processing on modifying the dictionary) enables this SZL process to be much faster than the AZL process (using computers having the same instruction execution rate).