Some programs cause a computer to process characters. For example, some programs for supporting clerical work read character data from a database to insert and print characters indicated by the character data on a formatted document at predetermined positions.
When characters are handled in a computer, each character is encoded in accordance with a certain character encoding scheme. Various character encoding schemes, such as ASCII (American Standard Code for Information Interchange), UTF (UCS (Universal Coded Character Set) Transformation Format)-8, UTF-16, UTF-32, Shift_JIS (Japanese Industrial Standards), have been proposed. In some different character encoding schemes, different character codes correspond to a same character.
The length (for example, the number of bytes) of a character code is different depending on character encoding scheme and, even in a same character encoding scheme, different depending on character. For example, UTF-8 expresses principal Latin characters with one byte and many Kanji characters with three to four bytes. UTF-32 expresses each character with four bytes. Shift_JIS expresses principal Latin characters with one byte and Kanji characters with two bytes. In recent years, large-scale character encoding schemes, which express various characters in the world using long (for example, a large number of bytes of) character codes, are proposed to increase the maximum length of character codes.
There is a proposal of a compiler that converts character codes by generating a program that reads data including character codes of UTF-8 and processes the data utilizing an application programming interface (API) for handling character codes of UTF-16. The compiler detects a command that reads out characters stored in a character variable in order to insert, before the command, another command that transcodes the character variable by converting its character code from UTF-8 to UTF-16.
Also, with respect to allocation of memory regions used by a program, there is a proposal of a compiler that generates a program that calls a function for receiving data of a part of a certain array (sub-array) as an argument. This compiler determines whether the sub-array data is stored in a continuous region in a memory. When the sub-array data is in a continuous region, the compiler generates a program that refers the function to original data directly. On the other hand, when the sub-array data is in discontinuous regions, the compiler generates a program that copies sub-array data in the memory to pass it to the function.
See, for example, Japanese Laid-open Patent Publication Nos. 2005-293386 and 11-184710.
In the meantime, in some situations, after a program that processes characters expressed by a certain character encoding scheme is created, one may wish to use the program to process characters expressed by another character encoding scheme. For example, when a program used for business operation in a certain country is created and thereafter the business operation is extended to another country of a different language, one may wish to use the program in the other country as well.
If the maximum length of character codes in the other character encoding scheme is longer than that in the original character encoding scheme which was estimated at the time of programming, characters expressed by the other character encoding scheme are not processed properly unless modified. For example, if a program including a character variable that allocates a two-byte memory region to each character processes a character expressed with four bytes, its character code overflows the memory region.
On the other hand, modifying a program to enable it to handle a longer character code than was estimated at the time of programming will have a large influence on a wide range of the program, necessitating enormous workload. For example, if the data size of a certain character variable is made larger, descriptions of various processes referring to the variable are also to be modified. Also, if its data structure is changed, a process procedure (algorithm) dependent on the data structure is also to be modified.