Generating text in non-English languages is a challenging task, especially for Asian languages such as Chinese, Japanese, Korean, That, Lao, Thaana, and Indian. Particularly in online HTML (Hyper Text Markup Language) forms, text generation for the aforementioned languages makes the page size too big to be accessed via the Internet as it requires several thousand characters to be defined in the HTML code. For example, Hindi, the national language of India, has more than one million characters that can be generated by several combinations of conjunct consonants, vowel signs and modifiers.
On the other hand, Chinese, which is a symbolic language, has several thousand symbols. These symbols are derivatives of the 214 Chinese base radicals. The derivative patterns are not the same for all the base radicals, unlike the vowel patterns of Indian languages. Incorporating these myriad conjunct characters or derivative symbols in a web page makes the page too large and impractical to be loaded from a web server to its client via the Internet, taking several minutes to hours.
Another interesting observation is that Japanese has three alphabets, Hiragana, Katakana and Kanji. While Hiragana is used for writing the native Japanese words. Katakana is used for writing foreign names and words. Kanji is a character set imported from the Chinese language. A typical Japanese text is written using all the three alphabets.
Text generation in non-English languages is necessary to search web pages in those languages on the Internet and intranet. There are several web sites on the Internet in non-English languages, especially Chinese, Japanese, Korean, Arabic, Russian, European and Indian languages.
Some websites on the Internet enable visitors to generate non-English text through transliteration (See a transliteration facility in the Telugu language on the web at url http://old.quilpad.in/telugu/). Transliteration is the technique of transcribing a word or text written in English into another language. Often transliteration for Indian languages generates inaccurate text and requires a lot of trial and error adjustments due to ambiguities in character mapping between English and the target language. For example, Tamil, a south Indian language has three characters that map to the English letter ‘L’. Similarly, there are three characters in Tamil representing the English letter ‘N’.
Indian languages have more consonants than English, resulting in ambiguities in mapping. Also, the vowel signs in Indian languages cannot be produced exactly with English vowel combinations unless a transliteration guide on a website is thoroughly studied.
Another drawback with transliteration is its implementation is very difficult and inaccurate for Mandarin Chinese and the Kanji part of Japanese as these scripts are not alphabet based but symbol based. In Chinese and Japanese Kanji, each symbol represents a word or a set of words.
Another means of generating non-English text is translation, wherein words typed in English are translated into the desired language. However, translation has its own limitations. It is not useful for those who are not proficient enough in English. Some words may be native to a particular language and may not even have their equivalents in English. Some times an English word may have more than one equivalent in other languages.
There are also some websites providing virtual key boards for generating text in non-English languages. For example, www.guruji.com is one such website providing a virtual key board in the Telugu language at the url http://www.guruji.com/te/index.html. However, the virtual key board of this prior art web site requires as many as six mouse clicks to generate a triple-consonant conjunct character in the Telugu language, which is a tedious job for a user. Another website, at the url http://www.search.webdunia.com/telugu.html, also provides a virtual keyboard which requires many mouse clicks to generate a triple-consonant conjunct character in the Telugu language.
A url on the web http://www.lookera.com/base/keyboards/chinese-keyboard.php provides a virtual key board in the Chinese language. However, it requires multiple mouse clicks and page scrolls to generate a typical Chinese character. Also, a couple of urls on the web http://www.gate2home.com/ and http://www.virtualkeyboard.ws/ provide virtual key boards in different languages, but these keyboards do not enable to generate characters with single mouse clicks in non-English languages such as Chinese, Japanese, Korean etc.