The growing trend toward multinational organizations has given rise to a corresponding need for fast, efficient, and accurate integration of data stored in different computer character sets generally corresponding to different human languages. That is, multinational organizations typically have regionally based information systems. Often these systems cannot share mission critical business information because they store data using incompatible character sets.
Typically in computing systems, the internal representation of characters is designed for one alphabet. For example, a computing system may be designed to represent western European characters corresponding to the languages that use this alphabet (e.g., English, French, German, etc.), but would not be able to represent languages using other characters (Cyrillic, Arabic, Japanese, Chinese, etc.)
Computer representation of characters typically assigns every character of the alphabet a unique numeric value. This means that a character set that represents each character using 8-bits can have only 256 characters. A 256-character character set is sufficient to represent the western European alphabet or Cyrillic (though not concurrently), but is insufficient for languages that employ more characters (e.g., Japanese, Chinese, etc.). Languages having large character sets have employed a two-byte (16 bit) representation of characters. Such character sets may employ a multi-byte encoding, with, for example, the first byte indicating the number of bytes used to represent the character. Such encoding did not provide the capability to combine character sets. So, for example, it was not possible to combine western European and Japanese or Japanese and Chinese character sets.
Unicode was developed to cover the major languages and character sets. Unicode represents each character using 16 bits and therefore can uniquely identify more than 60 thousand characters. This means that a Unicode character set acts as a superset for all the existing character sets for various languages or alphabets.
The majority of extant systems are not Unicode and there is, therefore, a need for conversion between various character sets. A computing system using Unicode can communicate with external computing systems employing various character sets, but there must be a conversion between Unicode and the character set of the external computing system.
The variety of character sets presents difficulties for multinational organizations in accomplishing a number of business processes. For example, a customer of a multinational computer manufacturer may place an order for 1000 PCs, with 200 of the PCs to be delivered to the customer's office in Japan and 800 of the PCs to be delivered to the customer's office in Germany. The customer may place the order from a computing system using a Chinese character set. The computer manufacturer may be receiving orders on a computing system using Unicode that can receive the order, convert the Chinese character set to Unicode the relevant data to its regional facilitates. The computer manufacturer will typically divide the order and place an order for 200 PCs with its Japanese facility and another order for 800 computers with its German facility, Each regional facility will receive the relevant data in Unicode and convert the data to its character set. That is, each of these external systems will store the data in its particular character set (i.e., Japanese and western European, respectively). If the customer wants to check the status of the order this may prove difficult as the order is now dispersed to external systems that may not be able to communicate with each other. So, in this case the business process of order tracking is impeded when an organization uses a network of external systems using different character sets.
Many other business processes depend on having depend on having a unified view of related data that cannot be readily obtained when the data is dispersed among external systems using different character sets.