1. Field of the Invention
The present invention relates to a data compressing method and an apparatus thereof.
2. Description of the Related Art
So far, computers have been widely used. In the communication field, data communication in association with multi-media is becoming attractive. However, when computers have been widely used, requirements for data handled by the computers and application software that runs thereon have become large. In addition, the amount of data handled by the computers tends to increase.
For example, in the Internet or the like, a distribution method of which a file of application software is downloaded from a remote server to a personal computer and the downloaded application is installed to the personal computer has been routinely used. In this case, to download application software, the user must keep a telephone line connected to such a network. On the other hand, because of an increase of functions of application software, the size of a file containing the application software is also becoming very large. Thus, it sometimes takes a couple of hours to complete a downloading operation for application software. Consequently, the communication fee increases.
In addition to performing the downloading operation of such a file, as data processed by computers increases, the amount of data stored in each storing medium such as a floppy disk and a hard disk tends to increase.
As the amount of data handled by computers increases, there are problems on effective uses of storage mediums, reduction of data transmission time, and so forth. To solve such problems, data is compressed in a special manner. The compressed data is transferred and stored in a particular storing medium. The original data is expanded (restored) in a particular expanding method corresponding to the compressing method.
As typical examples of several known data compressing methods, there are a Huffman coding method and a Lempel-Ziv method. In both the methods, a data file is treated as a set of symbols. The symbols are effectively reproduced. In other words, a data file composed of binary data of "0s" and "1s" can be represented as symbol string in such a manner that each byte of the data file is correlated with one character symbol.
In the Huffman coding method, all pieces of input data are read. The occurrence probability of each symbol in the input data is obtained. Next, an occurrence probability table that represents the occurrence probability of each symbol is generated. Next, a code is added to each symbol so as to identify the symbol in a predetermined method such as Huffman tree corresponding to the occurrence probability table. In other words, a code with a short bit length is assigned to a symbol with a large occurrence probability. In contrast, a code with a long bit length is assigned to a symbol with a small occurrence probability. Thereafter, the input data is read once again. Each symbol of the input data is substituted with an assigned code corresponding to the occurrence probability table in the predetermined method (for example, Huffman tree method). At the beginning of the compressed data in the Huffman coding method, an occurrence probability table that represents the occurrence probability of each symbol is output. Thus, when the compressed data is expanded, with reference to the occurrence probability table at the beginning of the compressed data, each code is substituted with a relevant symbol in the same manner as the compressing method.
In normal data, one symbol is represented with a predetermined number of bits. Thus, the amount of input data is represented with (the number of bits that represents a symbol).times.(the number of symbols contained in data). According to the Huffman coding method, as the occurrence probability of a symbol is large, the number of bits of a code assigned thereto is small. Thus, the amount of data can be reduced for such bits.
In the Lempel-Ziv method, symbol strings in input data are registered to a dictionary. The symbol strings are substituted with indexes of the dictionary. As the input data is being read, the dictionary is generated. Whenever a new symbol string appears, it is registered in the dictionary and substituted with an index. There are two dictionary generating methods that are referred to as LZ77 method and LZ78 method.
In the LZ77 method, symbols that have appeared in the past of predetermined kbits are registered to the dictionary. Symbols of input data are substituted with indexes registered in the dictionary. When a symbol of input data has not been registered in the dictionary, a code representing that the symbol has not been registered in the dictionary is added to the symbol and then output as is.
In the LZ78 method, all symbols or symbol strings that have appeared in the past of input data are registered to a dictionary. The symbols or symbol strings of the input data are substituted with indexes of the dictionary.
In the Huffman coding method, the occurrence probability table should be placed at the beginning of compressed data so as to expand the compressed data. Thus, when the amount of data is small, the effect of the data compression is lost.
In the Lempel-Ziv method, the effect of the data compression cannot be obtained unless symbol strings have been registered in a dictionary to some extent.
In other words, in each of these compressing methods, when the amount of data is small, the effect of the data compression is lost. That is, when a small amount of data is compressed, the amount of resultant data increases.