The present invention generally relates to code conversion, and more particularly to code conversion in a small target encoding space.
Many computers or other electronic devices, employ text as a means to interact with a user. These texts are generally displayed on a monitor or other types of display screens. The text is generally presented after using the internal digital representation of a computer or other such electronic device. This means that the characters must be encoded at some point and correlated to a character set. This concept, hereinafter referred to as “the character set encoding”, is used to correspond each character in the character set with a unique digital representation. The encoded character can be a letter, a figure or other types of text symbols. Each character is assigned a digit code which is then to be used by a computer or another electronic device. Computer systems in different languages use different character sets. For example in Chinese, the computer may use a “BIG5” or alternatively a “Unicode” character set.
A problem occurs, however, when using EBCDIC in conjunction with some of these character sets. For example, in Chinese when a Coded Character Set Identifier (CCSID) is used, the digit “00835” used for EBCDIC also indicates a code page related to traditional Chinese in double bytes. The problem often occurs when dealing with a code page or character conversion. One reason for this problem occurring is because the (smaller) code page area is limited in the expansion of code points. Therefore, when conversion of characters occurs from a larger code page, a concatenation takes place. For example, a comparison between Unicode and EBCDIC (CCSID=00835) shows that the code point range of EBCDIC code page is smaller than that of Unicode. Therefore in such cases, when a source's encoding space (such as Unicode) is larger than a target encoding space (such as EBCDIC), all available code points in the code table for the target encoding space will be exhausted.