The present invention relates to data compressing apparatus, reconstructing apparatus, and its method for compressing and reconstructing document data. More particularly, the invention relates to data compressing apparatus, reconstructing apparatus, and its method for compressing and reconstructing document data formed by character codes of a language such as Japanese, Chinese, Hangul, or the like having a word structure which is not separated by spaces.
In recent years, various data such as character codes, image data, and the like is dealt in a computer. Further, in association with the spread of the internet and intranet, the number of mails and electronized documents is increasing. In such a large amount of data, by compressing the data by omitting redundant portions in the data, a storage capacity can be reduced or the compressed data can be sent to a remote place in a short time. The field of the invention is note limited to the compression of character codes but can be applied to various data. The denominations which are used in the information theory are adopted, one word unit of data is called a character, and data in which an arbitrary plurality of words are connected is called a character train hereinbelow.
As data compression, there are a dictionary type coding using similarity of a data series and a probability statistic type coding using appearance frequency of only data. The dictionary type coding is a method whereby a character train is replaced to a registration number of a dictionary and a character train is registered in a manner such that as the appearance frequency of the character train is higher, the longer the character train is registered in the dictionary, thereby obtaining a high compression ratio. As a typical method of the dictionary type coding, there are LZ77 and LZ78 (for example, refer to Tomohiko Uematsu, xe2x80x9cDocument data compression algorithm handbookxe2x80x9d, CQ publisher). According to LZ77, a buffer of a predetermined amount is provided and a position and a length of a character train of the longest line which coincides in the buffer are encoded. On the other hand, according to LZ78, a character train appeared in the past is registered in a dictionary and a registration number is encoded. The probability statistic type coding algorithm is a method of obtaining a high compression ratio by allocating a short code length to a character having a high appearance frequency in accordance with a statistic appearance frequency of each character. As a typical probability statistic type coding, there are an arithmetic coding (for example, refer to Ian H. Witten et al., xe2x80x9cArithmetic Coding for Data Compressionxe2x80x9d, Commun. of ACM, Vol. 130, No. 6, pp. 520 to 540) and a Huffman coding (for example, refer to Donald E. Knuth, xe2x80x9cDynamic Huffman Codingxe2x80x9d, Journal of Algorithms, Vol. 6, pp. 163-180).
In order to obtain a further compression effect, a coding using a context collecting unit 200 and a variable length coding unit 202 in FIG. 1 for variable length coding on the basis of, not an appearance probability of each character, but a conditional appearance probability in which a context expressing a dependence relation between an input character and a character just before the input character is taken has been proposed. The method whereby the variable length coding is performed by using the conditional probability in which the context is taken is called a context model. The context and a coding target character are expressed by a tree structure of FIG. 2B when input characters of three characters of a, b, and c in FIG. 2A are used as an example. The tree structure is called a context tree and the number of times of appearance is counted at each node each time the character train which passes the character of each node appears, thereby obtaining the conditional probability.
There are three kinds of LZ78 systems and probability statistic type codings irrespective of an actual appearance frequency of a non-compression data train.
I. a static coding for dividing in accordance with a preset appearance frequency;
II. a semi-adaptive coding for dividing in accordance with an appearance frequency obtained by first scanning all of character trains; and
III. an adaptive coding for recalculating a frequency each time a character appears and dividing in accordance with the recalculated appearance frequency.
In a compression which doesn""t restrict the kind of non-compression data train, the semi-adaptive coding or the adaptive coding is used.
According to the conventional semi-adaptive coding and adaptive coding, when large data of about a few Mbytes is compressed, since a code adapted to the non-compression data train can be allocated, a high compression ratio can be obtained. In case of compressing small data of about a few kbytes, however, since every character train appears only about a few times, a code adaptive to a statistic appearance frequency cannot be allocated, so that a high compression ratio cannot be obtained by the semi-adaptive coding and the adaptive coding. On the other hand, in the static coding for dividing in accordance with the preset appearance frequency, although a constant compression ratio can be obtained irrespective of a data size, since the number of preset codes is fixed to one, there is a problem that a high compression ratio cannot be obtained with respect to data having a statistic amount different from the prepared code. Especially, when small data of about a few kbytes of document data of a language such as Japanese, Chinese, Hangul, or the like in which one character is expressed by word data of two bytes is compressed, a compression effect can be hardly expected by the conventional codings. There is also a case where the data amount after compression increases depending on a document. Further, the conventional codings have a problem that since a process is executed on a byte unit basis, the process is complicated and it is difficult to realize a high processing speed.
According to the invention, there are provided data compressing apparatus, reconstructing apparatus, and its method which can compress and reconstruct even data of a small kbyte order at a high speed while holding a high compression ratio.
(First Embodiment)
A target of the invention is a data compressing apparatus for compressing non-compression data formed by character codes of a language having a word structure which is not separated by spaces. As a language having the word structure which is not separated by spaces, for example, there are Japanese, Chinese, Hangul, and the like. Such a data compressing apparatus (basic apparatus) is characterized by comprising: a character train dictionary storing unit for storing a dictionary in which character trains each serving as a processing unit at the time of compression have been registered; a character train comparing unit for detecting the partial character train which coincides with the registration character train by comparing the registration character train in the character train dictionary storing unit with a partial character train in the non-compression data; and a code output unit for allocating a predetermined character train code every partial character train in which the coincidence has been detected by the character train comparing unit and outputting.
When considering Japanese as an example, there is a study result of Japan Electronic Dictionary Research Institute (EDR) Co., Ltd. regarding Japanese words (Yokoi, Kimura, Koizumi, and Miyoshi, xe2x80x9cInformation structure of electronic dictionary at surface layer levelxe2x80x9d, the papers of Information Processing Society of Japan, Vol. 37, No. 3, pp. 333-344, 1996). In the study result, morphemes constructing Japanese, that is, parts of speech of words are added up. When words are simply classified into parts of speech class and the parts of speech class are registered, the number of parts of speech class is equal to 136,486 and they can be expressed by codes of 17 bits (maximum 262,143). The number of characters constructed every word of about 130,000 words constructing a Japanese word dictionary formed by Institute for New Generation Computer Technology (ICOT) is detected and a distribution of the words is obtained. Consequently, it has been found that each of the 70,000 words whose number is more than the half of all of the registered words is constructed by two characters and that the average number of characters is equal to 2.8 characters (44.8 bits).
In the data compressing apparatus of the invention, a dictionary in which a character train code of a fixed length of, for example, 17 bits is allocated to each word of, for example, about 130,000 words which is practical as a dictionary of Japanese is formed and stored in the character train dictionary storing unit, a registration character train in the dictionary which coincides with the partial character train of the non-compression data is retrieved and the fixed length code of 17 bits is allocated and outputted as a character train code, thereby enabling the data amount to be substantially compressed to xc2xd or less irrespective of the size of document data. The character train dictionary storing unit comprises: a head character storing unit in which a head character of the partial character train to be compressed has been stored; and a dependent character train storing unit in which a dependent character train that is dependent on the head character stored in the head character storing unit has been stored. The head character storing unit stores a head address and the number of dependent character trains in the dependent character train storing unit while using the head character as an index. The dependent character train storing unit stores the length of dependent character train, the dependent character train, and the character train code as a set at one or a plurality of storing positions (corresponding to the number of dependent character trains) which are designated by the head address in the head character storing unit. The character train comparing unit obtains the length of dependent character train from the dependent character train storing unit by referring to the head character storing unit in the character train dictionary storing unit by the head character in the non-compression data which is being processed at present, extracts the partial character train of the length of dependent character train subsequent to the head character from the non-compression data, and detects coincidence between the partial character train and the registered dependent character train. When a detection result indicative of the coincidence with the registration character train is received from the character train comparing unit, the code output unit allocates the character train code stored in the dependent character train storing unit to the character train in which the coincidence was detected and outputs. By divisionally storing the character train dictionary in two layers as mentioned above, the dictionary size can be reduced and the retrieving speed can be raised. The head character storing unit stores the head address and the number of dependent character trains of the dependent character train storing unit while using the head character as an index. The dependent character train storing unit stores the length of dependent character train and the dependent character train as a set at one or a plurality of storing positions (corresponding to the number of dependent character trains) which are designated by the head address in the head character storing unit. Since the character train code is not stored in the dependent character train storing unit in the double-layer structure of the character train dictionary, the dictionary size can be reduced by such an amount. In this case, the character train comparing unit obtains the length of dependent character train from the dependent character train storing unit by referring to the head character storing unit in the character train dictionary storing unit by a head character in the non-compression data which is being processed at present by extracting the partial character train of the length of dependent character train subsequent to the head character from the non-compression data and detects a coincidence with the dependent character train which has been registered. When a detection result indicative of the coincidence with the registration character train is received from the character train comparing unit, the code output unit allocates and outputs a character train registration number indicative of the storing position in the dependent character train storing unit as a character train code.
A data reconstructing apparatus (basic apparatus) for reconstructing such compression data is characterized by comprising: a code separating unit for separating the character train code serving as a reconstruction unit from the compression data; a character train dictionary storing unit for storing a dictionary in which a reconstruction character train corresponding to the character train code serving as a processing unit upon reconstruction has been registered; and a character train reconstructing unit for reconstructing an original character train by referring to the character train dictionary storing unit by the character train code separated by the code separating unit. The character train dictionary storing unit stores a head character, the length of dependent character train, and the dependent character train as a set every character train code as a reconstruction target. The character train reconstructing unit recognizes a storing position in the character train dictionary storing unit on the basis of the character train code which is being processed at present and reconstructs the character train. The character train dictionary storing unit is constructed by: a head character storing unit in which a head character of the partial character train to be compressed has been stored; and a dependent character train storing unit in which a dependent character train dependent on the head character stored in the head character storing unit has been stored. The head character storing unit stores a head address and the number of dependent character trains of the dependent character train storing unit while using the head character as an index. The dependent character train storing unit stores a return address to the head character storing unit, a length of dependent character train, and the dependent character train as a set at a storing position designated by the head address of the head character storing unit. The double-layer dictionary structure can be commonly used by both of the data compressing apparatus and the data reconstructing apparatus. The character train reconstructing unit in the data reconstructing apparatus reconstructs the dependent character train by referring to the dependent character train storing unit on the basis of the character train code which is being processed at present and also reconstructs the head character with reference to the head character storing unit by obtaining the return address.
In another embodiment of the invention, a double coding for again coding by further performing the existing coding by using the compressed character train code as an intermediate code is executed. That is, in a data compressing apparatus (modified apparatus) for compressing non-compression data formed by character codes of a language having a word structure which is not separated by spaces, a coding is performed by a first coding unit in a manner such that a registration character train which has been registered in a dictionary and serves as a processing unit at the time of compression is compared with a partial character train in the non-compression data, thereby detecting the partial character train which coincides with the registration character train, and a predetermined character train code is allocated and outputted as an intermediate code every partial character train in which the coincidence was detected. Subsequently, the intermediate code train compressed by the first coding unit is inputted and is encoded again by a second coding unit. The second coding unit is a dictionary type coding unit such that the intermediate code train is replaced by a registration number of the dictionary and the intermediate code train having a higher appearance frequency is registered by a longer code train and is coded. LD77, LZ88, and the like are included in the dictionary type coding. The second coding unit can be a statistic type coding unit for allocating a short code to the intermediate code having a high appearance frequency on the basis of a statistic appearance frequency of the intermediate code and outputting. The arithmetic coding and the like are included in the statistic type coding unit. Further, there is provided a character train selecting unit for discriminating whether the non-compression data is a Japanese character train that is a first character train as a language which is not separated by a space or, for example, an English character train as a second character train that is a language which is separated by spaces, inputting the Japanese character train to the first coding unit, and inputting the English character train to a second coding unit 74. Consequently, an inconvenience such that English data which is not adapted to a word appearance tendency is encoded by the first coding unit in which a Japanese document is a target and a compression data amount becomes larger than the original data is eliminated. The details of the first coding unit 72 are the same as those of the data compressing apparatus.
A data reconstructing apparatus (modified apparatus) corresponding to the data compressing apparatus of another embodiment of the invention comprises: a first decoding unit for receiving compression data and reconstructing the intermediate code train; and a second decoding unit for receiving the intermediate code train decoded by the first decoding unit and reconstructing to the original non-compression data train. When the dictionary type coding such that the intermediate code train is replaced to a registration number of the dictionary and the intermediate code train of a higher appearance frequency is registered by a longer code train and is coded is executed on the data compressing side, the first decoding unit performs a dictionary type decoding such as LZ77, LZ78, or the like for reconstructing the intermediate code by referring to the dictionary by the input code. In the case where a statistic type coding for allocating a short code to the intermediate code having a high appearance frequency on the basis of a statistic appearance frequency of the intermediate codes and outputting is performed on the data compressing side, the first decoding unit executes a statistic type decoding such as an arithmetic decoding for reconstructing the intermediate code on the basis of the appearance frequency of the reconstructed intermediate code, or the like. Further, when the first stage coding and the second stage coding are performed to the Japanese character train which is not separated by spaces with respect to the non-compression data and the coding of only the second stage is performed to the English character train which is separated by spaces is executed on the data compressing side, a character train selecting unit is provided subsequently to the first decoding unit, the intermediate code train in which the Japanese code train obtained by the codings at the first and second stages has been decoded by the first decoding unit is inputted to the second coding unit. The character train in which the English code train obtained by the coding at only the second stage has been reconstructed by the first decoding unit is outputted as it is. The details of the second decoding unit in the data reconstructing apparatus are the same as those of the first data reconstructing apparatus.
Further, the data compressing apparatus (basic apparatus) has a dynamic dictionary storing unit in which a dynamic dictionary for registering the character train code outputted from the code output unit together with the partial character train of the non-compression data in which the coincidence was detected has been stored. In this case, with respect to the second and subsequent times, the character train comparing unit compares the registration character train in the dynamic dictionary storing unit with the partial character train in the non-compression data, thereby detecting the partial character train which coincides with the registration character train. When the coincident character train cannot be detected, the character train dictionary storing unit 14 is retrieved, thereby detecting the coincident partial character train. By forming the dynamic dictionary each time such a coding is executed, the dictionary retrieval when the character train which has been once encoded is subsequently encoded can be executed at a high speed. Similarly, the data reconstructing apparatus (basic apparatus) has a dynamic dictionary storing unit in which a dynamic dictionary for registering the character train outputted from the character train reconstructing unit together with the character train code of the compression data in which the coincidence was detected has been stored. In this case, with respect to the second and subsequent times, the character train reconstructing unit 40 compares the registration character train code in the dynamic dictionary storing unit with the character train code in the compression data, thereby detecting the character train code which coincides with the registration character train code. In the case where the coincident character train code cannot be detected, the character train dictionary storing unit is retrieved and the coincident character train code is detected, thereby raising the dictionary retrieving speed upon reconstruction.
Similarly, a data compressing apparatus (modified apparatus) of another embodiment also has a dynamic dictionary storing unit in which a dynamic dictionary for registering the character train code outputted from the code output unit together with the partial character train of the non-compression data in which the coincidence was detected has been stored. In this case, with respect to the second and subsequent times, the character train comparing unit 12 compares the registration character train in the dynamic dictionary storing unit with the partial character train in the non-compression data, thereby detecting the partial character train which coincides with the registration character train. When the coincident character train cannot be detected, the character train dictionary storing unit is retrieved and the coincident partial character train is detected, thereby enabling the dictionary retrieving speed at the time of coding to be raised. There is also provided an appearance frequency counting unit for counting an appearance frequency of the character train code outputted from the code output unit and outputting a count value to the second coding unit. In this case, the second coding unit executes an adaptive coding.
Similarly, a data reconstructing apparatus (modified apparatus) of another embodiment also has a dynamic dictionary storing unit in which a dynamic dictionary for registering the character train outputted from the character train reconstructing unit together with the character train code of the compression data in which the coincidence was detected has been stored. In this case, with respect to the second and subsequent times, the character train reconstructing unit compares the registration character train code in the dynamic dictionary storing unit with the character train code in the compression data, thereby detecting the character train code which coincides with the registration character train code. When the coincident character train code cannot be detected, the character train reconsructing unit retrieves the character train dictionary storing unit and detects the coincident character train code, thereby raising the dictionary retrieving speed upon reconstruction. Further, there is also provided an appearance frequency counting unit for counting an appearance frequency of the character train outputted from the character train reconstructing unit and outputting a count value to the first decoding unit 82. In this case, the first decoding unit performs an adaptive decoding.
The invention provides a data compressing method and a data reconstructing method having processing procedures for each of the data compressing apparatus and data reconstructing apparatus as basic apparatuses and the data compressing apparatus and the data reconstructing apparatus as modified apparatuses according to another embodiment.
(Second Invention)
According to the second invention, in order to compress words in a Japanese document at a high speed, character trains of the words are preliminarily classified into a plurality of attributes (parts of speech groups) and a short code is allocated to each of the classified attributes, thereby performing a data compression.
That is, according to the invention, a data compressing apparatus (basic apparatus) for compressing non-compression data formed by character codes of a language having a word structure which is not separated by spaces is characterized by comprising: a character train attribute dictionary storing unit for storing a dictionary in which character trains serving as a processing unit upon compression have been classified in accordance with attributes and divided into a plurality of attribute groups and registered; a character train comparing unit for comparing the registration character train in the character train attribute dictionary storing unit with a partial character train in the non-compression data, thereby detecting the partial character train which coincides with the registration character train; and a code output unit for allocating a set of a predetermined character train code and an attribute code indicative of the attribute group every partial character train in which the coincidence has been detected by the character train comparing unit 102 and outputting.
By performing such a process, the Japanese document can be compressed at a high speed while keeping a high compression ratio. The reason is as follows. According to the study result of Japan Electronic Dictionary Research Institute (EDR) Co., Ltd. mentioned above, when words are classified into attribute groups by parts of speech class as attributes of words, for example, a use frequency of a post positional word in a Japanese document is high and words of about xc2xc of the total number of words are post positional words. On the other hand, the number of kinds of post positional word classes is small and is equal to only 171. That is, by effectively expressing the post positional words, an effective compression can be realized. Further, when the lengths of post positional words in a Japanese word dictionary of about 130,000 words made by Institute for New Generation Computer Technology (ICOT) are obtained, the average length is equal to 3.6 characters (7.2 bytes). Even if all of the post positional words are used, there are only 171 kinds, so that they can be expressed by one byte (eight bits) which can express maximum 256 kinds. The average length of verbs is equal to 2.5 characters (five bytes) and there are 14,638 kinds of verbs, so that they can be expressed by two bytes (14 bits) or less. In a manner similar to the above, as for the other parts of speech as well, when they are divided into groups by the parts of speech, one word can be expressed by a small data amount in the group. Further, since there are 13 kinds of classification groups of the parts of speech, they can be expressed by four bits. Consequently, a code obtained by connecting an attribute code of four bits showing 13 kinds of groups of the parts of speech and a character train code for specifying the character train in the relevant group of the part of speech is allocated to the coincident character train by the dictionary retrieval and encoded, thereby enabling the Japanese document to be encoded at a high compression ratio. It is possible to construct in a manner such that with respect to a part of speech in which the number of characters of a word such as prefix or suffix is small or the other word classes which do not belong to any parts of speech, the number of kinds of parts of speech is reduced by outputting the original character train data as it is and the attribute code is reduced to, for example, three bits or less.
The character train attribute dictionary storing unit in the data compressing apparatus has a double-layer structure of a head character storing unit in which a head character of the partial character train to be compressed has been stored and a dependent character train storing unit in which a dependent character train which depends on the head character stored in the head character storing unit has been stored. The head character storing unit stores a head address and the number of dependent character trains in the dependent character train storing unit while using the head character as an index. The dependent character train storing unit stores a length of dependent character train, the dependent character train, a character train code, and the attribute code as a set at a storing position which is designated by the head address in the head character storing unit. By referring to the head character storing unit of the character train attribute dictionary storing unit by the head character in the non-compression data which is being processed at present, the character train comparing unit obtains the length of dependent character train from the dependent character train storing unit extracts the partial character train of the length of dependent character train subsequent to the head character from the non-compression data, and retrieves a coincidence with the registered dependent character train. When a retrieval result indicative of the coincidence with the registration character train is received from the character train comparing unit, the code output unit allocates the character train code and the attribute code stored in the character train attribute dictionary storing unit to the coincidence detected character train and outputs. The head character storing unit divided into two layers is divided into a plurality of attribute storing units according to the attribute groups. A dictionary number DN peculiar to each of the plurality of attribute storing units is set. The head address and the number of dependent character trains in the dependent character train storing unit are stored therein while using the head character as an index. The dependent character train storing unit corresponding to it stores a length of dependent character train and the dependent character train as a set at one or a plurality of storing positions (of the number corresponding to the number of dependent character trains) which are designated by the head address in the attribute storing unit and does not store the character train code and the attribute code, thereby reducing the dictionary size. In this case, when a retrieval result showing the coincidence with the registration character train is received from the character train comparing unit, the code output unit allocates the character train registration number indicative of the storing position in the dependent character train storing unit and the dictionary number DN of the attribute storing unit to the coincidence detected character train and outputs.
A data reconstructing apparatus (basic apparatus) corresponding to such a data compressing apparatus is characterized by comprising: a code separating unit for extracting a code serving as a reconstructing unit from compression data and separating into an attribute code and a character train code; a character train attribute dictionary storing unit which is divided into a plurality of attribute storing units according to attribute groups and stores a dictionary in which a reconstruction character train corresponding to the character train code serving as a processing unit upon reconstruction has been registered every attribute storing unit; and a character train reconstructing unit for reconstructing the original character train by referring to the character train attribute dictionary storing unit by the attribute code and the character train code separated by the code train separating unit. The character train attribute dictionary storing unit divides the head character as a reconstruction target, a length of dependent character train, and the dependent character train into the attribute groups and stores into the plurality of attribute storing units. The character train reconstructing unit selects the attribute storing unit on the basis of the separated attribute code, recognizes the storing position in the attribute storing unit selected on the basis of the separated character train code, and reconstructs the character train.
In another embodiment of the invention, a double coding in which compression data compressed by the data compressing apparatus as a basic apparatus is used as an intermediate code and is encoded again by an existing coding is performed. That is, a data compressing apparatus (modified apparatus) for compressing non-compression data formed by character codes of a language having a word structure which is not separated by spaces is characterized by comprising: a first coding unit for comparing a registration character train which has been registered in a character train attribute dictionary and serves as a processing unit upon compression, thereby detecting a partial character train which coincides with the registration character train, and allocating a set of a predetermined intermediate code and an attribute code every detected partial character train and outputting; and a second coding unit for inputting the intermediate code train compressed by the first coding unit and again compressing. The second coding unit is either a dictionary type coding unit such as LZ77, LZ78, or the like such that the intermediate code train is replaced by a registration number of the dictionary and the intermediate code train of a higher appearance frequency is registered by a longer code train and is coded or a statistic type coding unit of an arithmetic coding or the like such that a short code is allocated to the intermediate code of a high appearance frequency on the basis of a statistic appearance frequency of the intermediate code and is outputted. Further, by providing a character train selecting unit, the non-compression data is discriminated to see whether it is a first character train of Japanese which is not separated by spaces or a second character train such as English or the like which is separated by spaces. The first character train of Japanese is inputted to the first coding unit and the second character train of English is inputted to the second coding unit. Consequently, the inconvenience such that English data to which the word appearance tendency is not adapted is encoded and the compression data amount is larger than the original data amount in the first coding unit for the Japanese document as a target is solved. Although the details of the first coding unit are the same as those of the data compressing apparatus as a basic apparatus, since the encoding in the second coding unit is the process of the byte unit, the data is stored so that the sum of the attribute code and the character train code stored in the double-layer dependent character train storing unit provided for the character train attribute dictionary storing unit is set to a byte code of a multiple of eight bits. In the case where the attribute code and the character train code are not stored in the double-layer dependent character train storing unit provided for the character train attribute dictionary storing unit, when a retrieval result showing the coincidence with the registration character train is received from the character train comparing unit, the code output unit allocates a character train registration number indicative of the storing position in the double-layer dependent character train storing unit provided for the character train attribute dictionary storing unit and a dictionary number in the attribute storing unit to the byte code whose code length is a multiple of eight bits.
A data reconstructing apparatus (modified apparatus) corresponding to the data compressing apparatus (modified apparatus) of another embodiment is characterized by comprising: a first decoding unit for inputting compression data and reconstructing the intermediate code train; and a second decoding unit for inputting the intermediate code train reconstructed by the first decoding unit and reconstructing the original non-compression data train. The first decoding unit executes an attribute dictionary type decoding for reconstructing the intermediate code by referring to the attribute dictionary by the input code or a statistic type decoding for reconstructing the intermediate code on the basis of the appearance frequency of the reconstructed intermediate code. Further, when the codings at first and second stages is performed to a Japanese character train which is not separated by spaces and the coding only at the second stage is executed to a character train of English or the like which is separated by spaces on the data compression side, a character train selecting unit is provided subsequently to a first character train decoding unit. The intermediate code train in which the code train obtained by the codings at the first and second stages was decoded by the first decoding unit is inputted to the second decoding unit and the reconstruction character train obtained only by the coding at the second stage is outputted as it is. The details of the second decoding unit in this case are the same as those of the data reconstructing apparatus as a basic apparatus.
Further, the invention provides a data compressing method and a data reconstructing method having processing procedures for the data compressing apparatus and the data reconstructing apparatus as basic apparatuses and for the data compressing apparatus and the data reconstructing apparatus as modified apparatuses.
The above and other objects, features, and advantages of the present invention will become more apparent from the following detailed description with reference to the drawings.