(1) Field of the Invention
The present invention relates to a data compressing apparatus and a data decompressing apparatus, a data compressing method and a data decompressing method, a data compressing or decompressing dictionary creating apparatus, and a computer readable recording medium storing a data compressing program or data decompressing program for use when various data such as text data (character codes), image data, etc., is compressed or decompressed.
(2) Description of the Related Art
In recent years, various kinds of data such as character codes, image data, etc., is handled in a computer and a quantity of the data is increasing. Because of this, it is general in the computer that a redundant part in handled data is omitted and compressed so that a storage capacity necessary when the data is managed is decreased, or a transmission rate or transmission efficiency is increased at the time of data communication with a remote place in order to decrease a communication cost.
As data compressing methods, there are, for example, dictionary-based coding in which analogy of inputted data strings is used to code and compress the data strings, and statistical coding in which a frequency of occurrence of inputted data strings is used to code and compress the data strings. Hereinafter, one word of data (one alphabetic character, for example) is referred to as xe2x80x9ca characterxe2x80x9d, whereas a train of an arbitrary number of words of data is referred to as xe2x80x9ca character stringxe2x80x9d.
In concrete, in the former dictionary-based coding, a predetermined number (code) is assigned to a character or a character string occurring in a data string (data file, for example) that is an object of compression to create a dictionary (code table), and an actually inputted character (character string) is coded on the basis of the dictionary. A character (character string) having a higher probability is generally assigned as a longer character string in the dictionary so that a compression ratio is improved.
LZ77 and LZ78 (refer to xe2x80x9cIntroduction to Document Data Compression Algorithmxe2x80x9d, Tomohiko Uematsu, CQ Shuppansha, for example) are representatives of the dictionary coding system.
In LZ 77, characters (character strings) occurring in an inputted data string are stored in a buffer in advance, and a storing position (address) and a length of characters (a character string) in the buffer longest-matching with inputted characters (a character string) that is an object of compression are coded as a code of the inputted character (character string). In LZ78, characters (a character string) occurring in an inputted data string in the past are registered in a dictionary, and a register number of characters (a character string) in the dictionary matching with inputted characters (a character string) that is an object of compression is coded as a code of the inputted characters (character strings).
On the other hand, in the latter statistical coding, a frequency of occurrence of each character (character string) occurring in an inputted data string are calculated, and a shorter code is assigned to a character (character string) having a higher probability so as to improve a compression ratio.
Arithmetic coding (refer to xe2x80x9cArithmetic Coding for Data Compressionxe2x80x9d/IAN H. WITTEN, et al., Communication of the ACM, Vol.130, No.6, P520-540, xe2x80x9cAn Adaptive Dependency Source Model for Data Compression Schemexe2x80x9d/D. M. Abrahamson, Communication of the ACM, Vol.132, No.1, P77-83) and Haffman coding (refer to xe2x80x9cDynamic Haffman Codingxe2x80x9d/Donald E. Knuth, Journal of Algorithms, Vol.6, P163-180) are representatives of statistical coding.
As statistical coding, there is proposed another system, in which inputted characters are coded into a variable-length code on the basis of, not a probability of one character, but a conditional probability in consideration of dependency between an inputted character and a character immediately before the inputted character (hereinafter, referred to as a context) as shown in FIG. 48, for example, in order to accomplish a higher compression effect (hereinafer, such variable-length coding using a conditional probability in consideration of a context is referred as a context modeling).
In concrete, the context modeling collects a context from an inputted data string (original data), successively registers characters that are objects of coding [refer to FIG. 49(a)] in a dictionary of a tree structure [hereinafter referred as a context tree, refer to FIG. 49(b)] counts occurrences of each character each time a character string is inputted which traces characters registered in respective nodes of the context tree to obtain a conditional probability, and codes the original data on the basis of the obtained probability.
Both of dictionary-based coding and statistical coding are classified into three types as shown in items (1) through (3) below according to a way of considering occurrences of a data string that is an object of compression (hereinafter referred as a data string to be compressed):
(1) static coding: coding a character (character string) in a data string to be compressed according to occurrences set in advance irrespective of actual occurrences of the data string to be compressed;
(2) semi-adaptive coding: coding each character (character string) according to occurrences of each character (character string) obtained by scanning all characters (character strings) in a data string to be compressed before compression; and
(3) adaptive coding: re-counting occurrences of a character (character string) each time the same character (character string) as a character (character string) inputted in the past is inputted, and coding an inputted character (character string) according to the re-counted occurrences.
In static coding, a computer reads a dictionary set irrespectively of occurrences of an actual data string to be compressed from a memory or the like (set a dictionary: Step 1), fixedly uses the dictionary read out until inputted characters (character strings) end (until judged YES at Step A4) to code each of the inputted characters (character strings) (Steps A2 and A3, NO route at Step A4), as shown in FIG. 50, for example.
On the other hand, in semi-adaptive coding, the computer successively registers data string (characters/character string) to be compressed in a dictionary (Steps B1 and B2, NO route at Step B3), and assigns a code to each of the characters (character strings) registered in the dictionary according to occurrences of the character (character string) to code the dictionary (from YES route at Step B3 to Step B4), as shown in FIG. 51, for example.
The computer then puts back a pointer pointing a character (character string) to be inputted to the head of a data string to be compressed (Step B5), re-inputs the above data string (characters/character strings) to be compressed (Step B6), and codes each of the character (character strings) while referring to the above dictionary (Step B7, NO route at Step B8) until coding of all the data strings to be compressed is completed (until judged YES at Step B8).
In adaptive coding, the computer codes inputted a character (or a character string) referring to a dictionary set in advance (Step C2) when a data string (character or character string) to be compressed is inputted (Step C1) similarly to static coding in the beginning, as shown in FIG. 52, for example. After that, the computer re-counts occurrences of a coded character (character string), registers a code according to the obtained occurrences as a new code of the character (character string) in the dictionary (Step C3), and codes each character (character string) while updating the dictionary (NO route at Step C4) until coding of all the data strings to be compressed is completed (until judged YES at Step C4).
Above static coding performs coding fixedly using a dictionary set in advance. Therefore, the static coding can always achieve a constant compression ratio with respect to data strings to be compressed having similar statistics, and perform a high-speed compressing process. On the other hand, semi-adaptive coding and adaptive coding create or update a dictionary according occurrences of an actually inputted data string to be compressed so as to assign a code conforming to the actual data string to be compressed to the data string to be compressed. Therefore, it is possible to achieve a remarkable compression ratio even if a large quantity of data more than several megabytes or data having different statistics is compressed.
Above semi-adaptive coding and adaptive coding can achieve a good compression ratio when a large quantity of data of several megabytes as described above. However, when compressing a small quantity of data of several kilobytes such as text data in a text file, above semi-adaptive coding and adaptive coding cannot assign an appropriate code according to occurrences to each character (character string) since every character (character string) occurs only a few times in a data string (text file) to be compressed.
When a small quantity of data of several kilobytes is compressed, semi-adaptive coding and adaptive coding cannot achieve a high compression ratio.
On the other hand, above static coding can achieve a constant compression ratio irrespective of a data size of a data string to be compressed. However, a code to be assigned to each character (character string) occurring in the data string to be compressed is fixedly one so that a quantity of data having been compressed is possibly larger than a quantity of original data when data having different statistics from codes assigned in advance is compressed.
In the light of the above problems, an object of the present invention is to provide a data compressing technique which can stably achieve a preferable compression ratio for a small quantity of data such as text data in a text file and achieve a high compression ratio for data having different statistics, and a decompressing technique which can decompress compressed data obtained in the above compressing technique.
The present invention therefore provide a data compressing apparatus for coding data to be compressed to compress the same comprising a compressing dictionary storing unit for storing a compressing dictionary usable when the data to be compressed is compressed, a compressing dictionary use or non-use deciding unit for deciding whether the compressing dictionary is to be used or not when the data to be compressed is compressed, and a coding unit for coding the data to be compressed on the basis of the compressing dictionary when the compressing dictionary use or non-use deciding unit decides the compressing dictionary is to be used, whereas not coding but outputting said data to be compressed when the compressing dictionary use or non-use deciding unit decides the compressing dictionary is not to be used.
The present invention also provides a data compressing method for coding data to be compressed to compress the same comprising the steps of a deciding step of deciding whether a compressing dictionary is to be used or not when the data to be compressed is compressed, a coding step of coding the data to be compressed on the basis of the compressing dictionary when decided the compressing dictionary is to be used at the deciding step, and a data outputting step of not coding but outputting the data to be compressed when decided the compressing dictionary is not to be used at the deciding step.
The data compressing apparatus and the data compressing method according to this invention decide whether a compressing dictionary is to be used when data to be compressed is compressed, and code the data to be compressed on the basis of the compressing dictionary, or not code but output the data to be compressed, according to the decision. When a compression ratio is degraded if the coding is performed using a dictionary, data to be compressed is not coded (compressed) so that degradation of the compression ratio can be prevented. Whereby, it is possible to achieve a compression effect above a certain level at any time.
The above data compressing apparatus may have a compressed data dividing unit for dividing the data to be compressed into predetermined character data groups, wherein the coding unit codes the data to be compressed by the character data group obtained by the compressed data dividing unit.
If data to be compressed is divided into predetermined character data groups when the data to be compressed is coded and coded (compressed) by divided character data group, it is possible to code plural character data as a bunch at a time so that the coding process can be largely sped up as compared with a case where data to be compressed is coded by one character data.
If the above data to be compressed is document-form data, the above compressed data dividing unit may have a word dictionary storing unit for storing a word dictionary in which desired words are registered as the character data groups occurring in the document-form data, and a word dividing unit for dividing the data to be compressed into words on the basis of the word registered in the word dictionary in-the word dictionary storing unit.
Whereby, it is possible to divide data to be compressed (document-form data) into data units that are xe2x80x9cwordsxe2x80x9d having respective meanings and code the data, so as to limit the number of sorts of codes to be assigned to the data to be compressed. Whereby, a quantity of codes having been coded is decreased and a compression ratio is improved. If data to be compressed is coded by word, a decoding process on the decompressing side becomes easy and can be performed at a high speed.
If the above data to be compressed is document-form data, the above compressed data dividing unit may have a word category dictionary storing unit for storing a word category dictionary in which a desired word as each of the character data groups occurring in the document-form data and category information on the word are registered, and a word dividing unit for dividing the data to be compressed into words on the basis of the words registered in the word category dictionary in the word category dictionary storing unit, and a category information adding unit for adding the category information corresponding to each of the words obtained by the word dividing unit on the basis of the category information registered in the word category dictionary.
In the above case, it is possible to group words according to the category information so that the number of sorts of codes to be assigned to the words is decreased and a code to be assigned to each of the words is shortened. Accordingly, a quantity of codes having been coded is decreased and a compression ratio is improved. On the decompressing side, it is possible to readily specify a word to be decoded according to the above category information, which leads to speeding-up of the decoding process.
The above data compressing apparatus may further have a characteristic extracting unit for extracting character data inherent to the data to be compressed as characteristic data of the data to be compressed, and a compressing inherent dictionary creating unit for assigning a predetermined code to each of the characteristic data extracted by the characteristic extracting unit to create a compressing inherent dictionary inherent to the data to be compressed, wherein the coding unit codes the data to be compressed on the basis of the compressing inherent dictionary created by the compressing inherent dictionary creating unit and the compressing dictionary in the compressing dictionary storing unit.
In the above case, the data to be compressed is coded on the basis of both the compressing inherent dictionary and the compressing dictionary. It is thereby possible to largely decrease a probability of coding data to be compressed not registered in the dictionary, which leads to an improvement of the compression ratio.
The above data compressing apparatus may further have an inherent dictionary information outputting unit for outputting information on the compressing inherent dictionary to a decompressing side for the data to be compressed.
On the decompressing side, it is thereby possible to accurately decode (decompress) compressed data having been coded according to the compressing inherent dictionary originally created on the compressing side.
The above data compressing apparatus may still further have a compressing dictionary updating unit for updating the compressing dictionary on the basis of data to be compressed having been coded in the coding unit, wherein the coding unit codes the data to be compressed on the basis of the compressing dictionary updated by the compressing dictionary updating unit.
Whereby, it is possible to always provide a compressing dictionary suitable for data to be compressed that is an object of the next coding, which leads to a further improvement of the compression ratio.
The above compressing dictionary use or non-use deciding unit may decide whether the compressing dictionary is to be used or not on the basis of data contents type information representing a type of data contents of the data to be compressed.
In the above case, without actually detecting contents of data to be compressed, it is possible to simply decide use or non-use of the compressing dictionary. If content (characteristic) of data to be compressed on the basis of the above data contents type information, it is possible to quickly determine whether use of the compressing dictionary is effective when the data to be compressed is coded so as to decide the compressing dictionary is to be used or not. Accordingly, it is possible to achieve a compression effect above a certain level while speeding up the whole coding process.
The above compressing dictionary use or non-use deciding unit may decide whether the compressing dictionary is to be used or not according to whether specific character data occurs in the data to be compressed or not.
In the above case, it is possible to simply decide whether the compressing dictionary is to be used or not only by actually detecting contents of the data to be compressed and determining whether specific character data occurs in the data to be compressed so that a characteristic of the actual data to be compressed is quickly determined. It is therefore possible to achieve a compression effect above a certain level while improving reliability and a processing speed of the coding process.
The above compressing dictionary use or non-use deciding unit may decide whether the compressing dictionary is to be used or not according to occurrence frequency of specific character data in the data to be compressed.
If it is decided that the compressing dictionary is to be used on data to be compressed having a characteristic that specific character data frequently occurs therein, for example, a shorter code may be assigned to specific character data of a high occurrence frequency using the compressing dictionary. It is therefore possible to certainly achieve a compression effect above a certain level.
The above compressing dictionary use or non-use deciding unit may alternatively decide whether the compressing dictionary is to be used or not according to a quantity of compressed data having been coded by the coding unit.
In the above case, if it is decided that the compressing dictionary is not to be used for data to be compressed having a characteristic that a quantity of compressed data having been coded is larger than a quantity of original data, it is possible to largely decrease a probability of degrading a compression efficiency so that a compression effect above a certain level is ensured.
The above data compressing apparatus may further have a dictionary use or non-use information outputting unit for outputting information on use or non-use of the compressing dictionary decided by the compressing dictionary use or non-use deciding unit to a decompressing side for the data to be compressed.
In the above case, the decompressing side can quickly determine whether inputted compressed data has been coded using the compressing dictionary, which largely contributes to speeding-up of the decoding process.
The present invention further provides a data compressing apparatus for coding data to be compressed to compress the same comprising a compressing dictionary storing unit for storing plural kinds of compressing dictionaries usable when the data to be compressed is compressed, a compressing dictionary selecting unit for selecting a compressing dictionary to be used among the plural kinds of compressing dictionaries on the basis of data contents type information representing a type of data contents of the data to be compressed, and a coding unit for coding the data to be compressed on the basis of the compressing dictionary selected by the compressing dictionary selecting unit.
The present invention also provides a data compressing method for coding data to be compressed to compress the same comprising the steps of a dictionary selecting step of selecting a compressing dictionary to be used among plural kinds of compressing dictionaries on the basis of data contents type information representing a type of data contents of the data to be compressed, and a coding step of coding the data to be compressed on the basis of the compressing dictionary selected at the dictionary selecting step.
According to the data compressing apparatus and the data compressing method of this invention, a compressing dictionary to be used is selected among plural kinds of compressing dictionaries on the basis of the data contents type information representing a type of data contents of data to be compressed, and the data to be compressed is coded on the basis of the selected compressing dictionary. Only by inputting the above data contents type information, it is possible to quickly select and use a compressing dictionary suitable for contents (characteristic) of data to be compressed to code the data to be compressed. It is therefore possible to certainly achieve a high compression effect for data to be compressed having different characteristics while improving the processing speed of the whole compressing process.
The present invention still further provides a data compressing apparatus for coding data to be compressed to compress the same comprising a compressing dictionary storing unit for storing plural kinds of compressing dictionaries usable when the data to be compressed is compressed, a compressing dictionary selecting unit for selecting a compressing dictionary including specific character data of high occurrence frequency in the data to be compressed among the plural kinds of compressing dictionaries, and a coding unit for coding the data to be compressed on the basis of the compressing dictionary selected by the compressing dictionary selecting unit.
The present invention also provides a data compressing method for coding data to be compressed to compress the same comprising the steps of a dictionary selecting step of selecting a compressing dictionary including specific character data of high occurrence frequency in the data to be compressed among plural kinds of compressing dictionaries, and a coding step of coding the data to be compressed on the basis of the compressing dictionary selected at the dictionary selecting step.
According to the data compressing apparatus and the data compressing method of this invention, a compressing dictionary including character data of high occurrence frequency in data to be compressed is used to code the data to be compressed at any time so that a compression effect can be further improved. Since a dictionary is selected depending on whether the dictionary includes character data of high occurrence frequency in data to be compressed, the dictionary selecting process can be sped up, thus the whole coding process can be sped up.
The present invention still further provides a data compressing apparatus for coding data to be compressed to compress the same comprising a compressing dictionary storing unit for storing plural kinds of compressing dictionaries usable when the data to be compressed is compressed, a coding unit for coding the data to be compressed using any one of the plural kinds of compressing dictionaries, and a compressing dictionary selecting unit for selecting a compressing dictionary to be used among the plural kinds of compressing dictionaries according to a quantity of compressed data having been coded by the coding unit.
The present invention also provides a data compressing method of coding data to be compressed to compress the same comprising the steps of a coding step of compressing data to be compressed, and a dictionary selecting step of selecting a compressing dictionary to be used among plural kinds of compressing dictionaries according to a quantity of compressed data coded at the coding step.
According to the data compressing apparatus and the data compressing method of this invention, a compressing dictionary to be used is selected among plural kinds of compressing dictionaries according to a quantity of compressed data that is data to be compressed having been coded. It is therefore possible to select the most suitable compressing dictionary in consideration of a quantity of compressed data having been coded at any time, thus increase a compression effect more certainly.
Each of the above data compressing apparatus may further have a compressed data dividing unit for dividing the data to be compressed into predetermined character data groups, wherein the coding unit codes the data to be compressed by the character data group obtained by the compressed data dividing unit on the basis of the compressing dictionary selected by the compressing dictionary selecting unit.
In the above case, it is possible to code plural character data as a bunch at a time so that the coding (compressing) process after a dictionary is selected can be sped up as compared with a case where data to be compressed is coded by one character data.
If the above data to be compressed is document-form data, the compressed data dividing unit may have a word dictionary storing unit for storing a word dictionary in which desired words as the character data groups occurring in the document-form data are registered, and a word dividing unit for dividing the data to be compressed into words on the basis of the words registered in the word dictionary in the word dictionary storing unit.
In the above case, data to be compressed (document-form data) is divided into data units that are xe2x80x9cwordsxe2x80x9d having respective meanings and coded, whereby the number of sorts of codes to be assigned to the data to be compressed is limited. Accordingly, a quantity of codes after the coding process is performed using a selected compressing dictionary is decreased, thus a compression ratio is improved. If the data to be compressed is coded by word, the decompressing process on the decompressing side can be easy and sped up.
If the above data to be compressed is document-form data, the above compressed data dividing unit may alternatively have a word category dictionary storing unit for storing a word category dictionary in which a desired word as each of the character data groups occurring in the document-form data and category information on the word are registered, a word dividing unit for dividing the data to be compressed into words on the basis of the words registered in the word category dictionary in the word category dictionary storing unit, and a category information adding unit for adding the category information corresponding to each of the words obtained by the word dividing unit on the basis of the category information registered in the word category dictionary.
In the above case, words can be groups according to the category information. It is therefore possible to decrease the number of sorts of codes to be assigned to the words and shorten a code to be assigned to each of the words so that a quantity of codes having been coded using a selected compressing dictionary is further decreased, thus a compression ratio is improved. The decompressing side can thereby easily specify a word to be decoded according to the above category information, which leads to speeding-up of the decoding process.
Each of the above data compressing apparatus may further have a characteristic extracting unit for extracting character data inherent to the data to be compressed as characteristic data of the data to be compressed, and a compressing inherent dictionary creating unit for assigning a predetermined code to each of the characteristic data extracted by the characteristic extracting unit to create a compressing inherent dictionary inherent to the data to be compressed, wherein the coding unit codes the data to be compressed on the basis of the compressing inherent dictionary created by the compressing inherent dictionary creating unit and the compressing dictionary selected by the compressing dictionary selecting unit.
In the above case, data to be compressed is coded on the basis of both a compressing inherent dictionary and a compressing dictionary selected as above so that a probability of coding data to be compressed not registered in the dictionary is largely decreased, thus a compression ratio is further improved.
In this case, the above data compressing apparatus may have an inherent dictionary information outputting unit for outputting information on the compressing inherent dictionary to a decompressing side for the data to be compressed.
In the above case, the decompressing side can accurately decode (decompress) compressed data having been coded according to a compressing inherent dictionary originally created on the compressing side.
Each of the data compressing apparatus may further have a compressing dictionary updating unit for updating the compressing dictionary on the basis of data to be compressed having been coded by code in the coding unit, wherein the coding unit codes the data to be compressed on the basis of the compressing dictionary updated by the compressing dictionary updating unit.
In the above case, since the above compressing dictionary selected and used in the coding is updated on the basis of compressed data that is data to be compressed having been coded by code, it is possible to provide a plurality of compressing dictionaries suitable for the data to be compressed that is an object of the next coding as the coding process is proceeded so that a compression ratio is further improved.
In the above case, the above data compressing apparatus may have a selected dictionary information outputting unit for outputting selected dictionary information on the compressing dictionary selected by the compressing dictionary selecting unit to a decompressing side for the data to be compressed.
In this case, the decompressing side can quickly determine whether inputted compressed data has been coded using any one of the plural kinds of compressing dictionaries, which largely contributes to speeding-up of the decoding process.
The present invention still further provides a data decompressing apparatus for decompressing compressed data to decode the same comprising a decompressing dictionary storing unit for storing a decompressing dictionary usable when the compressed data is decompressed, a decompressing dictionary use or non-use deciding unit for deciding whether the decompressing dictionary is to be used or not when the compressed data is decompressed, and a decoding unit for decoding the compressed data on the basis of the decompressing dictionary when the decompressing dictionary use or non-use deciding unit decides the decompressing dictionary is to be used, whereas not decoding but outputting the compressed data when the decompressing dictionary use or non-use deciding unit decides the decompressing dictionary is not to be used.
The present invention also provides a data decompressing method for decoding compressed data to decompress the same comprising the steps of a receiving step of receiving dictionary use or non-use information on whether a decompressing dictionary is to be used or not when the compressed data is decompressed from a compressing side, a deciding step of deciding whether the decompressing dictionary is to be used according to the dictionary use or non-use information received at the receiving step, a decoding step of decoding the compressed data on the basis of the decompressing dictionary when decided the decompressing dictionary is to be used at the deciding step, and a data outputting step of not decoding but outputting the compressed data when decided the decompressing dictionary is not to be used at the deciding step.
According to the data decompressing apparatus and the data decompressing method of this invention, whether a decompressing dictionary is to be used when compressed data is decompressed is determined when the compressed data is decompressed, and the compressed data is decoded on the basis of the decompressing dictionary, or the compressed data is not decoded but outputted, according to a result of the decoding. It is therefore possible to omit unnecessary decoding process depending on a state of compression (including a time of non-compression) of compressed data so that the decoding process is performed very efficiently.
The above decoding unit may decode the compressed data by predetermined character data group on the basis of the decompressing dictionary.
In the above case, plural character data can be decoded as a bunch at a time so that the decoding (decompressing) process is largely sped up as compared with a case where the compressed data is decoded by one character data.
If data to be compressed that is the compressed data before compressed is document-form data, the character data group maybe a desired word in the document-form data, whereby the compressed data is decoded by word unit that is xe2x80x9ca wordxe2x80x9d having own meaning. Thus, the decoding process can be performed at a high speed.
If the compressed data is decoded on the basis of the category information on the above word, a word to be decoded can be readily specified on the basis of the above category information, which leads to further speeding-up of the decoding process.
The above decompressing apparatus may further have a decompressing inherent dictionary storing unit for storing a dictionary having character data inherent to data to be compressed that is the compressed data before compressed as characteristic data of the compressed data, in which a predetermined code is assigned to each of the characteristic data as decompressing dictionary, wherein the decoding unit decodes the compressed data on the basis of the decompressing inherent dictionary in the decompressing inherent dictionary storing unit and the decompressing dictionary in the decompressing dictionary storing unit.
The above data decompressing apparatus decodes compressed data on the basis of both the decompressing inherent dictionary and the decompressing dictionary. Accordingly, it is possible to largely decrease a probability of decoding data to be compressed not registered in the dictionary, thus improve a decoding efficiency.
The above decompressing inherent dictionary storing unit may receive information on a compressing inherent dictionary created by extracting character data inherent to the data to be compressed as characteristic data of the data to be compressed and assigning a predetermined code to each of the characteristic data from a compressing side having generated the compressed data to store the decompressing inherent dictionary.
In the above case, a dictionary having the same contents as an inherent dictionary having been used on the compressing side can be created as the above decompressing inherent dictionary so that compressed data having been coded on the basis of a compressing inherent dictionary originally created on the compressing side can be decoded very accurately.
The above data decompressing apparatus may further have a decompressing dictionary updating unit for updating the decompressing dictionary on the basis of a result of decoding by the decoding unit, wherein the decoding unit decodes the compressed data on the basis of the decompressing dictionary updated by the decompressing dictionary updating unit.
In the above case, it is possible to provide a decompressing dictionary suitable for compressed data that is an object of the next decoding at any time, which leads to an improvement of the decoding efficiency.
The above decompressing dictionary use or non-use deciding unit may decide whether the decompressing dictionary is to be used or not according to information on use or non-use of a compressing dictionary received from a compressing side having generated the compressed data.
In the above case, it is possible to quickly determine whether the inputted compressed data has been coded using a compressing dictionary, which largely contributes to speeding-up of the decoding process.
The present invention still further provides a data decompressing apparatus for decoding compressed data to decompress the same comprising a decompressing dictionary storing unit for storing plural kinds of decompressing dictionaries usable when the compressed data is decompressed, a decompressing dictionary selecting unit for receiving selected dictionary information on a compressing dictionary selected on the basis of data contents type information representing a type of data contents of data to be compressed from a compressing side having generated the compressed data to select a decompressing dictionary to be used among the plural kinds of decompressing dictionaries on the basis of the received selected dictionary information, and a decoding unit for decoding the compressed data on the basis of the decompressing dictionary selected by the decompressing dictionary selecting unit.
The present invention also provides a data decompressing method for decoding compressed data to decompress the same comprising the steps of a receiving step of receiving selected dictionary information on a compressing dictionary selected on the basis of data contents type information representing a type of data contents of data to be compressed from a compressing side having generated the compressed data, a dictionary selecting step of selecting a decompressing dictionary to be used among plural kinds of decompressing dictionaries on the basis of the selected dictionary information received at the receiving step, and a decoding step of decoding the compressed data on the basis of the decompressing dictionary selected at the dictionary selecting step.
According to the data decompressing apparatus and the data decompressing method of this invention, the decompressing side receives the selected dictionary information on a compressing dictionary selected on the basis of the data contents type information representing a type of data contents of data to be compressed from a compressing side, selects a decompressing dictionary to be used among plural kinds of decompressing dictionaries on the basis of the received selected dictionary information, and decodes the compressed data on the basis of the selected decompressing dictionary. It is therefore possible to quickly select a decompressing dictionary having the same contents as a compressing dictionary selected on the basis of the above data contents type information on the compressing side at any time so as to accurately decode (decompress) the compressed data.
The present invention still further provides a decompressing apparatus for decoding compressed data to decompress the same comprising a decompressing dictionary storing unit for storing plural kinds of decompressing dictionaries usable when the compressed data is decompressed, a decompressing dictionary selecting unit for receiving selected dictionary information on a compressing dictionary selected as a compressing dictionary including specific character data of high occurrence frequency in data to be compressed from a compressing side having generated the compressed data to select a decompressing dictionary to be used among the plural kinds of decompressing dictionaries on the basis of the received selected dictionary information, and a decoding unit for decoding the compressed data on the basis of the decompressing dictionary selected by the decompressing dictionary selecting unit.
The present invention also provides a data decompressing method for decoding compressed data to decompress the same comprising the steps of a receiving step of receiving selected dictionary information on a compressing dictionary selected as a compressing dictionary including specific character data of high occurrence frequency in data to be compressed from a compressing side having generated the compressed data, a dictionary selecting step of selecting a decompressing dictionary to be used among plural kinds of decompressing dictionaries on the basis of the selected dictionary information received at the receiving step, and a decoding step of decoding the compressed data on the basis of the decompressing dictionary selected at the dictionary selecting step.
According to the data decompressing apparatus and the data decompressing method of this invention, the decoding side receives the selected dictionary information on a compressing dictionary selected as a compressing dictionary including specific character data of high occurrence frequency from the compressing side, selects a decompressing dictionary to be used among plural kinds of decompressing dictionaries on the basis of the received selected dictionary information, and decodes compressed data on the basis of the decompressing dictionary. It is therefore possible to quickly select a decompressing dictionary having the same contents as a compressing dictionary selected as a compressing dictionary including specific character data of high occurrence frequency on the compressing side, so as to accurately decode (decompress) the compressed data.
The present invention still further provides a data decompressing apparatus for decoding compressed data to decompress the same comprising a decompressing dictionary storing unit for storing plural kinds of dictionaries usable when the compressed data is decompressed, a decompressing dictionary selecting unit for receiving selected dictionary information on a compressing dictionary selected according to a quantity of compressed data having been coded from a compressing side having generated the compressed data to select a decompressing dictionary to be used among the plural kinds of decompressing dictionaries on the basis of the received selected dictionary information, and a decoding unit for decoding the compressed data on the basis of the decompressing dictionary selected by the decompressing dictionary selecting unit.
The present invention also provides a data decompressing method for decoding compressed data to decompress the same comprising the steps of a receiving step of receiving selected dictionary information on a compressing dictionary selected according to a quantity of compressed data having been coded from a compressing side having generated the compressed data, a dictionary selecting step of selecting a decompressing dictionary to be used among plural kinds of decompressing dictionaries on the basis of the selected dictionary information received at the receiving step, and a decoding step of decoding the compressed data on the basis of the decompressing dictionary selected at the dictionary selecting step.
According to the data decompressing apparatus and the data decompressing method of this invention, the decoding side receives the selected dictionary information on a compressing dictionary selected according to a quantity of compressed data having been coded from the compressing side, selects a decompressing dictionary to be used among plural kinds of decompressing dictionaries on the basis of the received selected dictionary information, and decodes compressed data on the basis of the decompressing dictionary. It is therefore possible to quickly select a decompressing dictionary having the same contents as a compressing dictionary selected according to a quantity of compressed data having been coded on the compressing side at any time so as to accurately decode (decompress) the compressed data.
The above decoding unit may decode the compressed data by predetermined character data group on the basis of the decompressing dictionary selected by the decompressing dictionary selecting unit.
In the above case, the compressed data is decoded by plural character data as a bunch at a time so that the decoding (decompressing) process after a dictionary is selected is performed at a higher speed, as compared with a case where the compressed data is decoded by one character data.
If data to be compressed that is the compressed data before compressed is document-form data, the character data group maybe a desired word in the document-form data. In this case, the compressed data can be decoded by word unit that is xe2x80x9ca wordxe2x80x9d having own meaning so that the decoding process can be performed at a high speed.
The above decoding unit may decode the compressed data on the basis of category information on the word.
In the above case, it is possible to readily specify a word to be decoded on the basis of the above category information so that the decoding process is further sped up.
The above decompressing apparatus may further have a decompressing inherent dictionary storing unit for storing a dictionary having character data inherent to data to be compressed that is the compressed data before compressed as characteristic data of the compressed data, in which a predetermined code is assigned to each of the characteristic data as decompressing dictionary, wherein the decoding unit decodes the compressed data on the basis of the decompressing inherent dictionary in the decompressing inherent dictionary storing unit and the decompressing dictionary selected by the decompressing dictionary selecting unit.
Each of the above data decompressing dictionary decodes compressed data on the basis of both an inherent dictionary in which a predetermined code is assigned to each of character (characteristic) data inherent to data to be compressed that is the compressed data before compressed and a decompressing dictionary selected as above. It is therefore possible to largely decrease a probability of data to be compressed not registered in the selected dictionary, thus further improve the decoding efficiency.
The above decompressing inherent decompressing dictionary storing unit may receive information on a compressing inherent dictionary created by extracting character data inherent to the data to be compressed as characteristic data of the data to be compressed and assigning a predetermined code to each of the characteristic data from a compressing side having generated the compressed data to store the decompressing inherent dictionary.
In the above case, a dictionary having the same contents as an inherent dictionary used on the compressing side is generated on the decompressing side, so that data to be compressed having been coded according to the compressing inherent dictionary originally created on the compressing side can be decoded very accurately.
Each of the above data decompressing apparatus may further have a decompressing dictionary updating unit for updating the decompressing dictionary on the basis of a result of decoding by the decoding unit, wherein the decoding unit decodes the compressed data on the basis of the decompressing dictionary updated by the decompressing dictionary updating unit.
In the above case, as the decoding process is proceeded, it is possible to provide plural kinds o decompressing dictionaries suitable for compressed data that is an object of the next decoding, which leads to further improvement of the decoding efficiency.
The present invention still further provides a data compressing or decompressing dictionary creating apparatus for creating a dictionary used when data to be compressed is compressed or compressed data is decompressed comprising an occurrence frequency counting unit for counting an occurrence frequency of each character data occurring in data for creating a dictionary, a high occurrence frequency character data detecting unit for detecting character data whose occurrence frequency is higher than predetermined frequency on the basis of the occurrence frequency of each of the character data counted by the occurrence frequency counting unit, a code assigning unit for assigning a predetermined code to each of the high occurrence frequency character data detected by the high occurrence frequency character data detecting unit, and a dictionary generating unit for combining each of the high occurrence frequency character data with the code and outputting a combination thereof, thereby generating the dictionary.
According to the above data compressing or decompressing dictionary creating apparatus of this invention, a predetermined code is assigned to each of character data whose occurrence frequency is higher than predetermined frequency on the basis of occurrence frequency of each character data occurring in data for creating a dictionary, and the character data of high occurrence frequency is combined with the code and outputted, whereby a data compressing or decompressing dictionary is automatically created. It is therefore possible to omit labor to create the dictionary.
The present invention still further provides a data compressing or decompressing dictionary creating apparatus for creating a dictionary used when data to be compressed is compressed or compressed data is decompressed comprising a data dividing unit for diving data for creating a dictionary into predetermined character data groups, an occurrence frequency counting unit for counting an occurrence frequency of each of the character data groups obtained by the data dividing unit, a high occurrence frequency character data group detecting unit for detecting a character data group whose occurrence frequency is higher than predetermined frequency on the basis of the occurrence frequency of each of the character data groups counted by the occurrence frequency counting unit, a code assigning unit for assigning a predetermined code to the high occurrence frequency character group detected by the high occurrence frequency character group, and a dictionary generating unit for combining the high occurrence frequency character group with the code and outputting a combination thereof, thereby generating the dictionary.
According to the above data compressing or decompressing dictionary creating apparatus of this invention, a predetermined code is assigned to character group whose occurrence frequency is higher than predetermined occurrence frequency on the basis of occurrence frequency of each character data group obtained by dividing data for creating a dictionary, and the high occurrence frequency character data group is combined with the code and outputted, whereby a dictionary suitable for coding and decoding of data by the character data group can be automatically created. It is therefore possible to omit a labor to create the dictionary corresponding to character data groups.
If the data for creating a dictionary is document-form data, the character group may be a desired word in the document-form data. In which case, since a dictionary most suitable for a coding process and a decoding process by data unit that is xe2x80x9ca wordxe2x80x9d having own meaning is created, the coding process for data to be compressed and the decoding process for compressed data are largely sped up.
At this time, the code assigning unit may add category information on the word to the word. In which case, the words can be grouped according to the category information so that the number of sorts of codes to be assigned to the words is decreased, and a code to be assigned to each word is shortened. It is therefore possible to decrease a size of the dictionary.
By using the dictionary, a quantity of codes after the coding process is decreased and a compression ratio is increased on the compressing (coding) side. On the decompressing side, a word to be decoded can be readily specified according to the category information, which leads to speeding-up of the decoding process.
The present invention still further provides a recording medium readable by a computer in which a data compressing program for coding data to be compressed to compress the same is recorded characterized in that the data compressing program makes the computer function as a compressing dictionary storing unit for storing a compressing dictionary usable when the data to be compressed is compressed, a compressing dictionary use or non-use deciding unit for deciding whether the compressing dictionary is to be used or not when the data to be compressed is compressed, and a coding unit for coding the data to be compressed on the basis of the compressing dictionary when the compressing dictionary use or non-use deciding unit decides the compressing dictionary is to be used, whereas not coding but outputting the data to be compressed when the compressing dictionary use or non-use deciding unit decides the compressing dictionary is not to be used.
The present invention still further provides a recording medium readable by a computer in which a data compressing program for coding data to be compressed to compress the same is recorded characterized in that the data compressing program makes the computer function as a compressing dictionary storing unit for storing plural kinds of compressing dictionaries usable when the data to be compressed is compressed, a compressing dictionary selecting unit for selecting a compressing dictionary to be used among the plural kinds of compressing dictionaries on the basis of data contents type information representing a type of data contents of the data to be compressed, and a coding unit of coding the data to be compressed on the basis of the compressing dictionary selected by the compressing dictionary selecting unit.
The present invention still further provides a recording medium readable by a computer in which a data compressing program for coding data to be compressed to compress the same is recorded characterized in that the data compressing program makes the computer function as a compressing dictionary storing unit for storing plural kinds of compressing dictionaries usable when the data to be compressed is compressed, a compressing dictionary selecting unit for selecting a compressing dictionary including specific character data of high occurrence frequency in the data to be compressed among the plural kinds of compressing dictionaries, and a coding unit for coding the data to be compressed on the basis of the compressing dictionary selected by the compressing dictionary selecting unit.
The present invention still further provides a recording medium readable by a computer in which a data compressing program for coding data to be compressed to compress the same is recorded characterized in that the data compressing program makes the computer function as a compressing dictionary storing unit for storing plural kinds of compressing dictionaries usable when the data to be compressed is compressed, a coding unit for coding the data to be compressed using any one of the plural kinds of compressing dictionaries, and a compressing dictionary selecting unit for selecting a compressing dictionary to be used among the plural kinds of compressing dictionaries according to a quantity of compressed data having been coded by the coding unit.
In the above recording medium readable by a computer in which a data compressing program is recorded according to this invention, a program for realizing the above data compressing apparatus (data compressing method) is recorded. Only by reading the program recorded in the recording medium by a computer, the computer may function as the above data compressing apparatus. Therefore, generalization or spread of the above data compressing apparatus is largely expected.
The present invention further provides a recording medium readable by a computer in which a data decompressing program for decoding compressed data to decompress the same characterized in that the data decompressing program makes the computer function as a decompressing dictionary storing unit for storing a decompressing dictionary usable when the compressed data is decompressed, a decompressing dictionary use or non-use deciding unit for deciding whether the decompressing dictionary is to be used or not when the compressed data is decompressed, and a decoding unit for decoding the compressed data on the basis of the compressing dictionary when the decompressing dictionary use or non-use deciding unit decided the decompressing dictionary is to be used, whereas not decoding but outputting the compressed data when the decompressing dictionary use or non-use deciding unit decides the decompressing dictionary is not to be used.
The present invention still further provides a recording medium readable by a computer in which a data decompressing program for decoding compressed data to decompress the same characterized in that the data decompressing program makes the computer function as a decompressing dictionary storing unit for storing plural kinds of decompressing dictionaries usable when the compressed data is decompressed, a decompressing dictionary selecting unit for receiving selected dictionary information on a compressing dictionary selected on the basis of data contents type information representing a type of data contents of data to be compressed from a compressing side having generated the compressed data to select a decompressing dictionary to be used among the plural kinds of decompressing dictionaries on the basis of the received selected dictionary information, and a decoding unit for decoding the compressed data on the basis of the decompressing dictionary selected by the decompressing dictionary selecting unit.
The present invention still further provides a recording medium readable by a computer in which a data decompressing program for decoding compressed data to decompress the same characterized in that the data decompressing program makes the computer function as a decompressing dictionary storing unit for storing plural kinds of decompressing dictionaries usable when the compressed data is decompressed, a decompressing dictionary selecting unit for receiving selected dictionary information on a compressing dictionary selected as a compressing dictionary including specific character data frequently occurring in data to be compressed from a compressing side having generated the compressed data to select a decompressing dictionary to be used among the plural kinds of decompressing dictionaries on the basis of the received selected dictionary information, and a decoding unit for decoding the compressed data on the basis of the decompressing dictionary selected by the decompressing dictionary selecting unit.
The present invention still further provides a recording medium readable by a computer in which a data decompressing program for decoding compressed data to decompress the same characterized in that the data decompressing program makes the computer function as a decompressing dictionary storing unit for storing plural kinds of decompressing dictionaries usable when the compressed data is decompressed, a decompressing dictionary selecting unit for receiving selected dictionary information on a compressing dictionary selected according to a quantity of compressed data having been coded from a compressing side having generated the compressed data to select a decompressing dictionary to be used among the plural kinds of decompressing dictionaries on the basis of the received selected dictionary information, and a decoding unit for decoding the compressed data on the basis of the decompressing dictionary selected by the decompressing dictionary selecting unit.
In the recording medium readable by a computer in which a data decompressing program is recorded according to this invention, a program for realizing the above data decompressing apparatus (data decompressing method) is recorded. Only reading the program recorded in the recording medium by a computer, it is possible to function the computer as the above data decompressing apparatus. Therefore, generalization or spread of the above data decompressing apparatus is largely expected.