1. Field of the Invention
This invention relates to the arts of computer and information displays for multiple languages, alphabets, and scripts. The invention relates especially to the arts of human interfaces (input, display, output) concerning computer network and World Wide Web addresses in languages which require bidirectional display and presentation.
2. Description of the Related Art
The World Wide Web and the Internet have become everyday technologies in most developed economies, and is now becoming an integral part of the process in developing economies. Its ability to communicate information, both in written form such as web pages, graphic form such as photos and videos, and data form such as extensible markup language (XML), is becoming a key factor to every industry in every country of the world.
However, the current technologies supporting the World Wide Web are “English-centric” due to the roots of the beginning of the Internet being an American and European effort. As such, many of the conventions and “standards” employed in servers, routers, e-mail protocols, etc., employ an English alphabet with English-like syntax. Initially, companies and individuals in non-English countries were able to adopt and use these technologies due to their ability to work in both their native language and English.
It is, though, possible that certain information and concepts cannot be mapped into English from a native language, and thus represents an inability of the English-centric World Wide Web (WWW) to effectively communicate this information and these concepts. Further, the successfulness with which consumers may “find” a business on the WWW depends on their ability to input or select a web address which is logical and rational. If a business has a native language name, there may not be a logical or rational English equivalent. As such, businesses which primarily deal in non-English marketplaces may find their success in “going online” less than optimal given that they must adopt an English domain name.
Unicode's ability to represent multilingual text makes it good candidate for establishing the basis for a domain name structure. Unicode brings not only an encoding framework, but also support handling display requirements such as bidirectional scripts. The collection of Unicode's character equivalences is both desirable, and at times necessary, given Unicode's goal of encoding natural language text. These equivalences, however may present problems in the context of domain names.
Unicode's BiDirectional (Bidi) algorithm may be unsuitable for determining an appropriate display ordering for domain names. Specifically, the Bidi algorithm itself possesses a set of implicit assumptions about the usage of common characters. This set of assumptions may not be applicable to domain names. Domain names use the same repertoire of characters that appear in text. This requires a different algorithm for handling domain names.
The transition from the now ubiquitous monolingual ASCII based domain name system to a truly multilingual extendable system has been long awaited. Indeed, it may have already begun without waiting for standards to be developed. This move brings the goal of realizing a multilingual World Wide Web one step closer. Nevertheless, this transition must be approached cautiously as decisions made today may have long lasting effects. These decisions include the set of characters for constructing names, the base character encoding, and the codepoint transmission protocol.
There are, however, certain constraints that must be observed, regardless of these decisions. For example, domain names that are “legal” today must still remain legal in the new domain name system, otherwise the new system will not receive widespread acceptance. It is impractical to expect a vast overhaul or retrofit of thousands or millions of content servers, domain name servers, and routers in order to support a new, non-backwards-compatible domain name system.
A likely starting point for choosing the allowable set of characters from which domain names may be constructed is to start with the character repertoire available in the well-known Unicode/ISO10646 standard. The range of characters available in Unicode is vast and accommodates most modem written scripts. In contrast to ASCII, it includes scripts such as Arabic, Farsi and Hebrew.
At first glance, extending the current domain name system may not seem to be much of a challenge, given that all that needs to be done is to add more characters to the script. However, unlike ASCII, which only encodes scripts written and displayed in a left-to-right order, Unicode encodes scripts written right-to-left, as well as those written left-to-right. Additionally, in Unicode, it is perfectly “legal” to intermix these scripts, which provides not only for a wider variety of single-language displays, but also for displays of mixed content. However, when these scripts are intermixed, their display may become ambiguous, due to the conflicting directions.
In creating a new domain name system, such ambiguities must not exist. The display of such domain names can not simply be left up to the user or application software, which would certainly lead to confusion.
In order to alleviate this situation, a BiDirectional domain name method and system must not allow for ambiguities in the interpretation, display, or analysis of a BiDirectional domain name. Additionally, this method and system must be both simple to understand, easy to implement, and inexpensive to execute, in order to facilitate its widespread acceptance and use.
Therefore, there is a need in the art for a system and method which allows domain names to be handled and displayed with different (non-English) reading orders. Further, there is a need in the art for this system and method to be readily usable within the currently deployed technologies of the World Wide Web, and compatible with existing methods and systems such as Unicode's BiDi algorithm.