A domain name, such as verisign.com, is an identification string that defines a realm of administrative autonomy, authority, or control on the Internet. Domain names are formed by the rules and procedures of the Domain Name System (DNS). A DNS allows a domain name to be in a character set that is based on ASCII characters and does not allow domain names that include non-ASCII characters used in various non-English languages and represented, for example, by multi-byte Unicode character sets. To remove such constraints, the Internet Corporation for Assigned Names and Numbers (ICANN) has approved a system called Internationalized Domain Names in Applications (IDNA), which maps Unicode strings onto a valid DNS character set using an encoding known as Punycode. Punycode is an ASCII representation of a Unicode character, designed as such to allow multi-byte characters to be represented in the ASCII-only domain naming system. For example, the Unicode name “københavn.eu” for a domain name may be mapped to the ASCII name “xn--kbenhavn-54a.eu”.
Many domain name registries have adopted IDNA to enable the creation of non-ASCII internationalized domain names. An internationalized domain name (IDN) is a domain name represented by local language characters such as Unicode characters. IDNs enable Internet users to navigate the Internet in their preferred languages. An IDN may be used to represent a top-level domain (TLD) similar to dotcom (.com) or dot-edu (.edu), or may be registered as second-level domains (2LDs), similar to verisign in verisign.com, on an existing TLD.
Under some existing domain name creation systems for creating an IDN, registrants must not only enter their desired domain name, but also identify the domain name's underlying language. For example, a registrant may want to register a dotcom domain for , the Bulgarian word for “Hello.’ A registrar such as GoDaddy offers to the registrant a registration interface in which, for example, the registrant fills out a request electronic form. In the form, the user enters in a domain name field the requested name, , and further selects Bulgarian under a language field. Once the registrant submits the request, the registrar may perform a search. If the registrar determines that the requested domain name has not been previously registered, it may allow the registrant to register the IDN. In particular, upon submission by the user, the registrar converts the IDN (here ) to a Punycode value (for example, XN--80AEEGAHS6CWA) and uses that value in subsequent actions.
Some problems however, may arise if the registrant selects the wrong language. For instance, in the above example of the string , the registrant may mistakenly select Russian instead of Bulgarian. This selection will not be accurate, because the Russian term for “Hello” is  and not . Many registrars and backend registry operators will allow such a transaction to go forward with the erroneous language tag and without performing any language verification. Such a behavior may not be desirable for users. In the above example, for instance, the registrant may have mistyped the name and may have intended a valid Russian word that is different from . Alternatively, the registrant may have intended to register the name  as a Bulgarian domain name and have selected Russian by error. In either case, the users may prefer that the registrar prevent the registration or at least issue a warning before allowing the registrant to register the IDN under the incorrect language. Solutions are needed to address these and similar problems related to detecting and setting the language of IDNs.