1. Field of the Invention
The present invention generally relates to data compression methods and apparatuses and, more particularly, to a lossless data compression method and apparatus capable of reconstructing original data used, for example, in a computer system.
2. Description of the Related Art
As computers are used in more advanced applications, the volume of programs and data processed by a computer increases rapidly. This challenge has been met by increased efficiency of recording mediums and increased transmission rates in a computer system. Data compression methods for compressing data to a smaller size have been employed in order to attain a further increase in transmission rate and ease of use in handling data.
Two modes of data compression are known. In one of these modes, redundancy of information is reduced by transforming the information so that its size is reduced but its practical validity is not lost. Such a mode of compression is applied to information such as speech data and image data which depend on human sensitivity for interpretation. In the other mode, redundancy (repetition of data) is exploited to compress original data in a lossless manner so that the original data can be completely reconstructed. Such a mode of compression is applied primarily to information such as numerical data, document data and program codes, which should be accurate digital data.
One of the most widely used lossless data compression methods is the LZ77 method invented by Lampel and Ziv. The LZ77 method uses a portion of previous data as a dictionary. Also widely used are the LZ78 method and the LZW method. The LZ78 method and the LZW method also generate a dictionary from previous data and are known as an elaboration of the LZ77 method.
A description will now be given of the LZ77 data compression method.
In the LZ77 method, data to be compressed is compared with a range of previous data. A search is made in the range for the longest match with the data to be compressed. Compressed data is created from information indicating the relative position of the longest match in the range and the length of the match.
FIGS. 1A, 1B and 1C illustrate a sample operation of the LZ77 data compression method that is generally practiced. FIG. 1A shows data to be compressed (input data) and a predetermined range of previous data (sliding window) to be compared with the input data. FIG. 1B shows uncompressed data. FIG. 1C shows compressed data. Generally, the input data and the previous data are contiguous with each other.
In the LZ77 method shown in FIGS. 1A, 1B and 1C, the unit size of the input data to be compressed (maximum match length) is set to 17 bytes and the window size is set to 512 bytes. In the LZ77 method, the predetermined range of the previous data is processed as a sliding dictionary. The input data to be compressed is first compared with the data in the sliding dictionary. The comparison is performed using a binary tree search described later.
In the aforementioned comparison, a search for the longest match with the 17-byte input data is started at the head of the sliding dictionary. The location of the match (relative byte position) and the length of the match (matching byte count) are encoded using 14-bit compression codes.
Bit 1 of the compressed data indicates a Prefix. The Prefix "1" indicates that the next 13-bit data is compressed data (compressed Token). The Prefix "0" indicates that the next 8-bit data is 1-byte uncompressed data.
The 9 bits following the Prefix in the compressed data indicate the relative byte position (address), in the sliding dictionary, of the data that matched the input data. In the example of FIGS. 1A, 1B and 1C, the size of the sliding dictionary is set to 512 (=2.sup.9). Thus, 9 bits are required to indicate the relative byte position.
The 4 bits in the compressed data following the bits indicating the relative byte position indicate the length of the match with the input data. In the above example, the size of the input data is set to 17 bytes so that the maximum value of the matching byte count is 17. When the matching byte count is 1, compression of the 1-byte (8 bits) data produces an increase in the data size from 8 bits to 14 bits. Therefore, no benefit results from data compression. In this case, the original data is transmitted as uncompressed data of 9 bits.
When the matching byte count is 2, conversion of the 2-byte (16 bits) data into compressed data produces a decrease from 16 bits to 14 bits. Thus, data compression proves beneficial. That is, compression according to LZ77 of 2-byte repeated data results in a compression ratio better than 1:1. When a match for the entire 17 bytes is found, the 136 bits (8 bits.times.17) is compressed to 14 bits for transmission. Since the matching byte count ranges between 2 and 17, 4 bits are used to indicate the matching byte count.
As described above, when a data string producing a matching byte count of 2 or greater is not found in the sliding dictionary as a result of the comparison, the raw data is transmitted as uncompressed data. When the input data has been compared with the sliding dictionary, the input data and the sliding dictionary are shifted by 1 byte for a subsequent comparison.
According to the data compression described above, a compression ratio of 1:2-1:3 is achieved. In order for this data compression to be performed effectively, means for conducting a high-speed search in the sliding dictionary for a longest match with the input data is required. Conventionally, the binary-tree search is widely used as such means. In the binary-tree search, a search tree is built up by arranging the data strings in the sliding dictionary in accordance with a predetermined algorithm (described later in detail). The input data is traced through this search tree and compared with nodes of the tree in order to find a location of the data string that produces a longest match. When the comparison is completed, the position of the window is shifted by 1 byte so that the dictionary is updated.
A description will now be given of the operation of the binary-tree search.
FIG. 2 illustrates the operation of the binary-tree search. In the example shown in FIG. 2, it is assumed that the character string "The Cruelty of Really Tea" is compared with the subsequent 17-character string (not shown). That is, the character string "The Cruelty of Really Tea" is defined as a window (sliding dictionary). The first 17 characters constitute string1. String1 is successively shifted to the left by 1 byte so as to obtain string2, string3, string4, string5, string6, string7 and string8. Each of these strings is compared with the input data.
Before the comparison, a binary tree as shown in FIG. 2 is built up for the character string "The Cruelty of Really Tea". In the binary-tree search, levels are assigned to different types of characters such that a space, characters A-Z and characters a-z have increasingly higher levels in the stated order.
For example, when string1 is compared with string2, it is determined that the character "h" at the head of string2 is higher in level than the character "T" at the head of string1. Therefore, string2 is stored in a branch associated with a higher level (the right branch in the illustration). Comparing string2 with string3, it is found that the character "h" at the head of string2 is higher in level than the character "e" at the head of string3. Therefore, string3 is stored in a branch associated with a lower level (the left branch in the illustration) with respect to string2.
Comparing string3 with string4, it is found that the character "h" at the head of string2 is higher in level than the character "C" at the head of the string4. Further, when string4 is compared with string1, it is found that the character "T" at the head of string1 is higher in level than the character "C" at the head of string4. Therefore, string4 is stored in a branch associated with a lower level with respect to string1. Thus, the tree is built up in accordance with the levels of the characters.
The tree as shown in FIG. 2 enables efficient search of a match with the input data.
However, the data compression according to the related art as described above has the following drawbacks.
(1) Ideally, a search time according to the binary-tree search widely used in data compression is proportional to the logarithm of the window size. Therefore, the binary-tree search can be applied to a large window. However, a relatively long search time is needed depending on the pattern of the input data. A binary tree quite unlike the tree of FIG. 2 may be constructed. For example, most of the child nodes are found to the right of the respective parent nodes or most are found to the left. Such a tree would resemble a straight line. When such a tree is used, substantially exhaustive comparison with the strings may have to be made, thus requiring a long search time. Such a binary-tree search cannot be used in a system operating on real-time processing.
(2) When an ideal binary tree is built up and matches of several byte length are successively found, an assumption can be made that several bytes of data are being processed simultaneously in a data compression operation. However, the window (sliding dictionary) is updated only by 1 byte at a time. Thus, the operation (that is, the updating) that has little to do with data compression presents a bottleneck to the processing speed of data compression.
(3) A binary-tree search determines the next process to be performed based on the immediately preceding operation so that it is not adapted for parallel processing. Therefore, hardware implementation of binary-tree search for speedier processing is difficult.