This invention relates generally to Internet browsers and particularly to browsers which support a large number of characters.
Textual data, displayed as printed material or on a computer monitor, is encoded in the form of binary numerical codes. When a given key on a keyboard is stricken, a character code for that key is generated. A computer then uses the character code to select an appropriate character shape from a stored font file listing with the same character code.
English language personal computer systems generally employ a seven bit character code. This code is codified according to the American Standard Code for Information Interchange (ASCII) (ANSIx3.4-1996) that allows for character sets of about 128 items of upper and lower case Latin letters, Arabic numerals, signs and control characters.
The Chinese, Japanese and Korean (CJK) languages have a relatively large number of characters, on the order of hundreds of thousands of characters. These characters far exceed the capacity, in and of themselves, of the 7-bit ASCII character codes.
For example, personal computers in Japan presently utilize the Japanese Industrial Standard (JIS) X0208-1990 that accommodates only 6,879 characters. While this is adequate for many basic functions, it may be insufficient for writing people's names, place names, historical data and other such information.
The existing CJK character sets are not sufficient to provide a wide variety of important information given the available character sets. For example, with the GB-2312 and Big-5 character sets used by the Netscape Communicator web browser, only about 16,000 characters are available.
As a result, the International Organization for Standardization has created a standard called ISO-2022 that outlines how seven bit and eight bit character codes may be structured. The Chinese language version is ISO-2022-CN, set forth in Request for Comments (RFC) 1922 (Network Working Group, 1996).
The so-called Unification Code or Unicode was developed by a number of U.S. software firms to unify all the world's character sets into one large character set. See International Organization for Standardization ISO/IEC 10646-1 (1993) Geneva, Switzerland. Unicode seeks to limit the character set space to sixteen bits or a maximum of 65,536 characters. This character space means each character must be represented by a fixed length code of sixteen bits or two bytes. However, even with Unicode, all of the world's characters, including the hundreds of thousands of CJK characters, can not be expressed using a character set that only allows for 65,536 total characters.
Unicode, for example, does not allow for creating on-line digital libraries in CJK languages. Such libraries may need unabridged character sets that include all characters that have ever been used. In addition, the ability to write the personal and place names of many people and places in CJK countries, is important. As a result, Unicode is inadequate for dealing in a culturally complete way with the CJK character sets. While CJK language users may be able to make do with much smaller character sets, the expressiveness and versatility of the CJK languages may be severely limited by available character codes.
Thus, there is a need for a better way of handling character sets that makes available a larger number of character codes, especially for use in connection with CJK languages.