1. Field of the Invention
The present invention relates to a device and method for compressing and decompressing various data such as character data, image data, etc.
2. Description of the Related Art
Along with the recent rapid development of computers, various information such as character information, vector information, image information, etc. are processed by computers, turning out a great amount of data. Under this current situation, there are proposed a data compression method for compressing the amount of data by omitting redundant portions included in the data to shorten the time required for data transmission or to enable efficient use of the storage capacity of storage devices, and a decompression method therefore.
Methods based on universal coding applicable to various data such as character information, vector information, image information, etc. are adopted as the coding method used for compressing data. Universal coding includes dictionary coding methods that take advantage of similarity between character strings and statistical coding methods that utilize the frequency of appearance of characters.
Lempel-Ziv coding is a typical example of the dictionary coding method. Two algorithms are proposed for Lempel-Ziv coding method; one is a sliding dictionary type (also called universal type) and the other is a dynamic dictionary type (also called increment resolvable type). Known as the improved versions of the sliding dictionary type algorithm are LZSS coding, QIC-122 coding which is a standard compression method for ¼-inch cartridge magnetic tapes, etc.
On the other hand, LZW (Lempel-Ziv-Welch) coding, etc. are known as the improvements of the dynamic dictionary type.
The statistical coding method aims for improving the compression efficiency by allocating longer coding lengths to characters that appear with a higher frequency of appearance, based on a statistical frequency of appearance (occurrence rate) of each character. For example, an arithmetic coding method and a Huffman coding method are known as the typical methods of the statistical coding method. According to the Huffman coding method, a code (Huffman code) having a coding length which is in inverse proportion to the appearance frequency of a character is used for the character.
Meanwhile, there is also a complex data compression method that takes both the way of the above-described dictionary coding method and the way of statistical coding method. Such a method includes an LZH method, which compresses data having been compressed by the above-mentioned LZSS method by the Huffman method. Since the LZSS method compresses data in the unit of character string whereas the Huffman method compresses data character by character, the effect of mutual complementation can be expected from the complex method of these.
An earlier sliding dictionary type algorithm (LZ1) will now be explained. This algorithm requires much computing, but can achieve a high compression rate. This algorithm divides the data to be coded into longest partial character strings each corresponding in its contained characters to those of a character string included in a past data flow and starting from an arbitrary position in the past data flow, and then codes each partial character string as a copy of the past character string. FIG. 28 is a diagram for explaining this principle.
In FIG. 28, a P buffer 281 stores input data already coded, and a Q buffer 282 stores data yet to be coded. A character string 283 input in the Q buffer 282 is compared, from its top, with the character string flow stored in the P buffer 281. If a longest character string 284 that corresponds in contained characters is found in the P buffer 281, the coding starts. For example, in a case where a corresponding character string 284 is found in the P buffer 281 as shown in FIG. 28, the data is compressed by setting the start position of the longest corresponding character string 284 as p1, and setting the length of the longest corresponding character string in the Q buffer 282 as q1. This coding compresses, for example, document information made up of character codes to ½ of its original length. An invention utilizing this sliding dictionary type algorithm is disclosed in, for example, Japanese Patent No. 3241788.
Next, the statistical coding method will be explained. As shown in FIG. 29, the statistical coding method involves an input data buffer 291, a statistic modeling unit 292, an appearance frequency table 293, and an entropy coding process unit 294. The statistic modeling unit 292 scans character strings stored in the input data buffer 291 and calculates the appearance frequency of each character. The entropy coding process unit 294 assigns a code generated based on the appearance frequency calculated by the statistic modeling unit 292 to each character.
The appearance frequency calculated by the statistic modeling unit 292 is classified into a rate for a static coding method in which the appearance frequency of each character is determined in advance, a rate for a semi-adaptive coding method in which the appearance frequency of each character is obtained by scanning the whole character string at the beginning of the process, and a rate for an adaptive coding method in which the frequency is reset every time each character appears to recalculate the appearance frequency. An invention utilizing the statistical coding method is disclosed in, for example, Japanese Patent No. 3276860.
However, the above-described coding methods have the following problems.
First, according to the dictionary coding method, a longest partial character string corresponding in the contained characters to those of the data to be coded is searched out from the character string already coded, and the data is coded as a copy of the partial character string, as described above. A structure that can attain a more and more improved data compression rate has to be employed in order to utilize this kind of algorithm. Such a structure also has to be capable of making the coded data convenient for computer processing.
For example, in order to improve the data compression rate in the dictionary coding method, it is necessary to increase the number of characters to be stored in the P buffer, and the same is required in the Q buffer. However, if the number of characters to be stored in the P buffer and Q buffer is increased, the data to be coded might not be a multiple of the bit number “8” bit-wise. This calls for complicated processes such as bit relocation when the data is to be transmitted. Furthermore, the amount of comparison computing is large, resulting in very poor compression efficiency.
In a case where the dictionary coding method is realized by hardware, the circuit scale will be enormously if the compression and decompression circuits are constructed on a single integrated circuit, greatly increasing the cost of the necessary hardware.
Furthermore, at the very beginning of the use, the sliding dictionary type algorithm codes the data with no reference data stock built in the dictionary. Therefore, the compression rate is low for the earlier input data, because of scarce contents being available in the dictionary.
On the other hand, according to the statistical coding method, since the coding process requires the involvement of the statistic modeling unit 292 and the entropy coding process unit 294 as the two-pass process between them, the processing speed is low. Furthermore, because of its nature as the variable length coding, this coding method requires bit treatment, which takes a long time, and besides, makes the coding logic complicated.