1. Field of the Invention
The present invention relates to a coding apparatus and a decoding apparatus which can be optimally applied in compressing and reconstructing various data such as CAD data, document data, etc.
2. Description of the Related Art
Recently, an increasing volume of various types of data such as character codes, image data, etc. have been processed in a computer. When such large volume of data is stored and transmitted to a distant destination, it is common to compress the data with the redundant portion of the data removed to reduce the storage capacity and improve the transmission speed.
There are two common data compressing systems. They are a dictionary type coding system based on the similarity in data sequences; and a probability statistic type coding system based on the frequency of occurrences of data strings.
A typical example of the dictionary type coding system is an LZ77 system and an LZ78 system.
In the LZ77 system, a predetermined buffer is provided, the position of the previous data matching in longest length is retrieved from the previously input data in the buffer, and the matching position and the matching length are used as codes.
FIG. 1 shows the coding method in the conventional LZ77 system.
In FIG. 1, assume that ‘a b a b c d e f a b c d e f g h . . . ’ is input as data to be compressed, and each character of the data to be compressed is assigned an input number indicating an occurrence position.
First, if ‘a’ having the input number 1 is input, then the character ‘a’ is coded as is because it has no preceding characters. Then, when a character ‘b’ having the input number 2 is input, it is compared with the previously input characters. However, there are no characters matching the character ‘b’, the character ‘b’ is coded as is. Furthermore, when a character string ‘a b’ having the input numbers 3 and 4 is input, it is compared with the previously input character strings. As a result, since the character string matches a character string ‘a b’ having the input numbers 1 and 2, the character string ‘a b’ having the input numbers 3 and 4 is coded using the matching position and matching length. In this example, since the matching position is the position of the character ‘a’ having the input number 1, and the matching length is 2, ‘(1, 2)’ is coded as the code of the character string ‘a b’ having the input numbers 3 and 4.
Next, when a character ‘c’ having the input number 5 is input, it does not match any of the previously input characters. Therefore, the character ‘c’ is coded as is. When a character ‘d’ having the input number 6 is input, it does not match any of the previously input characters. Therefore, the character ‘d’ is coded as is. When a character ‘e’ having the input number 7 is input, it does not match any of the previously input characters. Therefore, the character ‘e’ is coded as is. When a character ‘f’ having the input number 8 is input, it does not match any of the previously input characters. Therefore, the character ‘f’ is coded as is.
Then, when a character string ‘a b c d e f’ having the input numbers 9 through 14 is input, it matches a character string ‘a b c d e f’ having the input numbers 3 through 8. Therefore, the character string ‘a b c d e f’ having the input numbers 9 through 14 is coded using the matching position and the matching length. In this example, since the matching position is position of the character ‘a’ having the input number 3, and the matching length is 6, ‘(3, 6)’ is coded as the code of the character string ‘a b c d e f’ having the input numbers 9 through 14.
When a character ‘g’ having the input number 15 is input, it does not match any of the previously input characters. Therefore, the character ‘g’ is coded as is. When a character ‘h’ having the input number 16 is input, it does not match any of the previously input characters. Therefore, the character ‘h’ is coded as is. On the other hand, in the LZ78 system, a previously input character string is entered in a dictionary, and an entered input number is coded.
The LZ77 system has higher compression performance than the LZ78 system for data containing a repetition of a long character string. On the other hand, the LZ78 system has higher compression performance than the LZ77 system for data containing a repetition of a comparatively short character string. The LZ77 system and the LZ78 system are described in, for example, the document “The Introduction to the Document Data Compression Algorithm” by Tomohiko Uematsu published by CQ Publishing Company.
A typical system of the probability statistic type coding system can be the arithmetic coding system and the Huffman coding system. Both arithmetic coding system and Huffman coding system obtain a compression effect by allotting a short code length to a character having a high occurrence probability according to the statistic occurrence frequency of each character
The arithmetic coding system is described in, for example, the document “Arithmetic coding revisited” by Alister Moffat et al., 1995, IEEE Data Compression Conference, p 202-211. The Huffman coding system is described in, for example, the document “The Introduction to the Document Data Compression Algorithm” by Tomohiko Uematsu published by CQ Publishing Company.
To obtain a higher compression effect, a variable length coding method has been suggested based on the conditional occurrence probability (P[Xt|xt−1]) in which not the occurrence probability (P(Xt)) of a single character but the dependence (hereinafter referred to as a context) between an input character and its previous is taken into account. This method is described in, for example, the document “Unbounded Length Contexts for PPM” by John G. Cleary et al., 1995, IEEE Data Compression Conference, p 52-61.
The probability statistic type coding system as well as the LZ78 system has higher compression performance for data containing a repetition of a comparatively short character string. Normally, the LZ78 system has a higher processing speed than the probability statistic type coding system. On the other hand, the probability statistic type coding system has a higher compression rate than the LZ78 system.
However, the LZ78 system and the probability statistic type coding system have high compression rate for data containing a repetition of a comparatively short character string, but cannot have sufficient compression rate for data containing a repetition of a long character string.
On the other hand, the LZ77 system has high compression rate for data containing a repetition of a long character string, but cannot have sufficient compression rate for data containing a repetition of a comparatively short string.
Therefore, the conventional compression systems have difficulty in obtaining high compression rate for data containing a repetition of long character strings and comparatively short character strings.
The present invention aims at providing a data coding apparatus capable of efficiently compressing both long and short character strings.