1. Field of the Invention
This invention relates generally to character encoding, and more particularly to a method for encoding e-mail or other communication data in a format that is conformant with recognized data communication standards and that interoperates with most third party recipient clients worldwide.
2. Description of the Related Art
Conventionally, in the context of computer storage and communication, character encoding consists of assigning a code that associates a set of characters called a “character repertoire” with a “coded character set” (also called a “character encoding”). A character repertoire is a full set of abstract characters that a system or a natural language supports. A coded character set or character encoding specifies a representation of each of the characters from the character repertoire using a number of integer codes.
For example, the representation of strings depends on the choice of a character repertoire and the method of character encoding. Older implementations were designed to work with repertoire and encoding schemes defined by ASCII, or more recent extensions like the ISO 8859 series. The need to communicate across multiple computer systems in multiple languages lead to the development of modern character encoding implementations that often use the extensive repertoire defined by Unicode along with a variety of complex encodings such as 8-bit Unicode Transformation Format (UTF-8) and 16-bit Unicode Transformation Format (UTF-16). However, even with the development of these modern character set and encoding implementations, e-mail and other forms of communication across multiple systems or in multiple languages still presents several difficulties, as discussed below.
In most popular Web-based e-mail client systems, no real effort is made to properly encode e-mail messages. Typically, after a user composes an e-mail, the web-based system browser incorrectly converts characters of the e-mail using Hyper Text Markup Language (HTML) escaped Unicode sequences, into characters that do not exist in the browser's active character set. The corrupted e-mail is then encoded and sent to an e-mail recipient. Once received by the recipient, the e-mail will only display correctly if the recipient also uses a Web-based client system that does not correctly escape HTML when displaying plain text e-mail, a case of two wrong behaviors canceling each other. However, in the more behaved client systems, the corrupted e-mail will be displayed to the recipient as an unintelligible sequence of number and symbols. The situation is slightly better if the recipient user uses characters that were present in the browser character set used by the sending application but, even so, this scenario results in e-mail that violate e-mail standards as they contain 8-bit data in the headers and the data in the message body is unaccompanied by a character set label. The headers typically get corrupted by mail servers that like to correct illegal e-mail, and even when that doesn't happen, the messages still only display correctly if the receiving client explicitly selects the character set used by the sending application, assuming the option to make such a selection is even available to the user.
In other Web-based e-mail client systems, the user is allowed to select their preferred language in a webmail application, but then the user is limited to only the characters available in the character set commonly used for that language. So, for example, a user composing an e-mail with English set as the preferred default language cannot view or compose e-mail with Chinese characters. Moreover, if despite having English as the default language setting, the user composes an e-mail that includes Chinese characters, the message would be received in an invalid, often unreadable format by the recipient.
In modern PC-based e-mail client systems (“fat client” systems), users are typically allowed to choose the type of encoding to be used in outgoing e-mail messages. However, this approach is not ideal because most users are not familiar with character encoding requirements and, as a result, users typically make incorrect character encoding selections. If a character encoding selection is made incorrectly, the user will likely experience annoying error messages from their client system, assuming the client system is doing checking. To overcome these errors, the user will generally select the first character encoding type that does not result in the generation of error messages even though the character encoding selection may not be the correct choice to interoperate with client systems used by the e-mail recipient.
Moreover, the approach of allowing a user to select the character encoding is generally less suitable for Web-based applications than for PC-based applications. In Web-based applications there are complicated exchanges between the client system and the server system that need to take place for correct message validation, which generally makes any encoding errors made by the user more problematic.
In view of the forgoing, there is a need for an encoding approach that allows a user to compose e-mail or any other communication data in any language using any characters and allows the e-mail recipient to properly display the transmitted content.