When encoded data need to be subjected to character code conversion, the character code conversion is generally implemented in two passes, in the order of decoding processing and character code conversion processing (see, for example, Japanese Laid-open Patent Publication No. 2003-30030). Therefore, a storage area for storing a result of the decoding processing needs to be prepared.
ZIP using LZ77 is the mainstream of encoding and decoding algorithms. With ZIP, for a character string to be encoded, a longest matching character string is determined by using a slide window to generate encoded data. For encoded data to be decoded, a longest matching character string is determined by using a slide window to generate decoded data. The determination of longest match character strings by using slide windows is performed byte by byte.
However, there is a problem that when code conversion of the encoded data is performed character by character after the decoding processing, because byte lengths in the decoding processing of the encoded data and in the character code conversion processing are different from each other, these processes need to be executed separately. Thus, for example, there is problem that waste is generated in the storage area. From another point of view, there is a problem that the processing time becomes long.
For example, for ZIP, in the encoding processing and decoding processing, while the determination of the longest match is performed byte by byte, the character code conversion processing is performed character by character. A length of a character in a character code system, such as UTF-8, which includes CJK characters, is known to be either of one to four bytes. That is, while there are characters each expressed by one byte (for example, alphanumeric characters), characters each expressed by three bytes (for example, some of Level-1 kanjis, and Level-2 kanjis and kana characters) and characters each expressed by four bytes (for example, some of Level-3 and Level-4 kanjis) are also present. Therefore, by the longest matching of the decoding processing, the decoded data generated byte by byte are in units different from byte units of these characters. Accordingly, the decoded data are unable to be directly handed over to the character code conversion processing in which the characters are directly treated as units, and the decoding and the character code conversion are unable to be executed in one pass. As a result, in the decoding processing, the result of decoding of the entire encoded data needs to be stored in the storage area and waste in the storage area is generated. Further, the processing time for the decoding processing and character code conversion processing becomes too long.
The problem that the decoding processing of the encoded data and the character code conversion processing need to be performed as separate processes will be described with reference to FIG. 1A and FIG. 1B. FIG. 1A is a diagram illustrating a decoding and conversion process using the LZ77 system. As illustrated in FIG. 1A, in decoding processing, all of encoded data are decoded, and all of the decoded data that have been decoded are stored into a storage area. In character code conversion processing, character codes of all of the decoded data stored in the storage area are converted to generate converted data.
FIG. 1B is another diagram illustrating the decoding and conversion process using the LZ77 system. With reference to FIG. 1B, a case where encoded data in UTF-8 character codes are decoded will be described. As illustrated in FIG. 1B, each of storage areas A1, A2, B1, and B2 is secured in a memory, for example. The storage area B1 is called “read buffer”, for example. In the decoding processing, encoded data stored in the storage area B1 are decoded by performing longest match determination with the storage areas A1 and A2 corresponding to slide windows. The storage area A1 is called “encoding portion”, for example. The storage area A2 is called “reference portion”, for example. In the decoding processing, the decoded data that have been decoded are directly written into the storage areas A2 and B2. The storage area B2 is called “write buffer”, for example.
For example, in first longest matching of the decoding processing, the encoded data stored in the storage area B1 are decoded by using the storage areas A1 and A2. That is, since the decoding processing is performed byte by byte, ends of characters in the decoded data that have been decoded are not recognized. In the decoding processing, the decoded data that have been decoded are directly written into the storage areas A2 and B2. In the example of FIG. 1B, the decoded data that have been decoded in the first longest matching are “E2BC98E386”. In this case, the data written into the storage area B2 are the decoded data as is, “E2BC98E386”. This “E2BC98E386” consists of “+” (0xE2BC98) (a Japanese character meaning, “ten”), and “□” (0xE386), which is not up to an end of a character. That is, “parting in tears”, which means being off from a boundary between character codes, has been caused. When data of a second longest match, “93”, are written into the storage area B2, “” (0xE38693) (a Japanese character meaning, “two”), which is up to the end of the character, is generated. Accordingly, while the decoding processing is performed byte by byte, the character code conversion processing is performed character by character, and thus the decoded data obtained by the decoding processing are not able to be directly subjected to the character code conversion. Therefore, in the decoding processing using the LZ77 system, after decoding of the entire decoded data is performed, character code conversion is conducted for the decoded data that have been decoded, and thus a storage area B3 for a result of the character code conversion is needed and waste is generated in the storage area A2 used in the decoding process. Further, the processing time for the decoding processing and character code conversion processing becomes long.