1. Field of the Invention
The present invention relates to a system for converting between character codes for printed or displayed text and, more particularly, to a code converter for converting one character set to multiple character sets.
2. Description of the Related Art
Computers and other electronic devices typically use text to interact with users. The text is usually displayed on a monitor or some other type of display device. Because the text must be represented in digital form inside the computer or other electronic device, a character set encoding must be used. Generally speaking, a character set encoding operates to encode each character of the character set with a unique digital representation. The characters (which are encoded) correspond to letters, numbers and various text symbols and are assigned numeric codes for use by computers or other electronic devices. The most popular character set for use with computers and other electronic devices is the American Standard Code for Information Exchange (ASCII). ASCII uses 7-bit sequences for its encodings. In other countries, different character sets are used. In Europe, the dominant character encoding standards are the ISO 8859-X family, especially ISO 8859-1 (called "Latin-1") developed by the International Standards Organization (ISO). In Japan, the dominant character encoding standard is JIS X0208 where JIS refers to the Japanese Information Standard and was developed by Japan Standards Association USA). Examples of other existing character sets include Mac.TM. OS Standard Roman encoding (by Apple Computer, Inc.), Shift-JIS (Japan), Big5 (Taiwan), and many more.
With the ongoing globalization of business and networks, it has become important for computers or other electronic devices to be able to handle multiple character encodings. For example, the same computer or electronic device may be used by persons of different nationalities who wish to interact with the computer or other electronic device in a different language. For each such language a different character set encoding is usually needed. However, character sets for the same language can also differ.
There is also a need to be able to convert from one character set encoding to another encoding. For example, a user in France using ISO 8859-1 may want to send an electronic mail message in French to a user in Israel who is using ISO 8859-8. Because the sender and receiver are using different character set encodings, the non-ASCII characters in the message will be garbled for the user in Israel. Ideally, one of the computers or electronic devices would convert from one character set to another character set. This has been achieved to a limited extent between a few character sets, but is largely not possible with modern computers or electronic devices. Code conversion is made difficult because of the numerous different character standards and the often conflicting or inconsistent national standards.
The Unicode.TM. standard (hereafter simply Unicode or Unicode standard) was developed to provide an international character encoding standard. The designers of the Unicode standard wanted and did provide a more efficient and flexible method of character identification. The Unicode standard includes characters of all major International Standards approved and published before Dec. 31, 1990, as well as other characters not in previous standards. The characters are encoded in the Unicode standard without duplication. The codes within the Unicode standard are 16-bits (or 2 bytes) wide.
A character code standard such as the Unicode standard facilitates code conversion and enables the implementation of useful processes operating on textual data. For example, in accordance with the above example, the computer or other electronic device in France can transmit Unicode characters and the computer or other electronic device in Israel can convert the Unicode characters it receives into a Hebrew based character set that is compatible with the computer or other electronic device in Israel. For additional detail about the Unicode standard, see, e.g., The Unicode Standard, Worldwide Character Encoding, Version 2.0, Addision-Wesley 1996, which is hereby incorporated by reference in its entirety.
One problem with Unicode is that when the Unicode text originates from multiple different encodings, it is difficult to convert the Unicode text back to the original multiple different encodings. In particular, some computer systems or applications that execute on computer systems do not support Unicode encodings. Hence, when such computer systems or applications receive Unicode text, they are not able to properly utilize the text. Hence, code conversion of the Unicode text to a target encoding understood by the computer system or application is needed. The difficulty is when the Unicode originates from multiple different encodings, the computer system (e.g., operating system) would not normally understand how to convert the Unicode back to the original multiple different encodings. In some cases, font or style information might be available and associated with the Unicode text so as to provide a suggestion as to the originating encodings. However, often such font or style information is not available.
Thus, there is a need for improved approaches to converting Unicode text to multiple different encodings.