Indian languages in general are written the way they are pronounced. Many of the vowels and consonants found in the Indian languages are the same. Some have a few more and some have fewer characters than others do. There are different character sets, and the writing schemes have different sets of rules. In the written form, each language has hundreds of character combinations. It is extremely difficult to represent them in a standard keyboard. There have been some attempts to reduce the number of characters in each language to help with electronic print media. For instance, the print media has stopped using many of the former consonant-vowel combinations, where each such combination represented a separate character, and started using independent vowel symbols along with the consonants. The same way, many combined consonants also gave way to consonants written as separate characters, thus reducing the total number of characters required in a language. With such recent innovations, the Indian writing scheme can be described in terms of stand-alone vowels, vowel symbols used along with consonants, the consonants, and a few special characters, and the rules governing the character combinations.
Even with the reduction in the number of characters supported in each language, it was not easy to represent the total number of characters required in an Indian language in a standard keyboard. For instance, in Malayalam, which is a South Indian language, there are fifteen stand-alone vowels and fourteen vowel symbols, and thirty-six consonants and a number of other special characters. Indian languages are case insensitive languages, i.e., the concept of lower case and upper case characters does not exist in Indian languages. Even by distinguishing the upper case letter keys from the lower case letter keys, the standard English keyboard only supports fifty-two letters in total. As such, people began to use other character keys to represent the additional letters.
Another attempt to manage the issue of number of keys available in a standard keyboard vs. the number of keys required by an Indian language is via software programs, which designate English character combinations to represent each required Indian language character. This represents an improvement to the prior art of character mapping using special key combinations. However, there is no standard for this mapping between the English character representation and the Indian language characters. Different schemes exist even within the same Indian language. Each software vendor elects his own scheme to do the mapping for the particular language or languages supported by the vendor's software.
The influence of English on Indian languages is indisputable. So many English words have become common words among the Indian population and speakers of different Indian languages use these words not recognizing those words as having come from the English language. When English words are written using an Indian language writing scheme, those words become difficult to understand and are mostly pronounced wrong. As such, writing English words in English, along with the Indian language is a welcome change to maintain the accuracy of the English words.
When fonts were created for Indian languages, those who created those fonts did not follow any standard to make them compatible with each other. In English, a character has the same value irrespective of the font being used. For Indian languages, specialized programs are required to map between different fonts. At present, there is some attempt to make them uniform and inter-changeable.
When typing Indian languages, one is forced to use the shift keys on a constant basis. This is true whether the scheme is keyboard mapping or specialized software using English characters to enter Indian language characters. With keyboard mapping, shifting of keys between upper case and lower case and the use of specialized keys is inevitable because each symbol must be assigned to a specific key or key combination. This problem could have been avoided by using specialized software. Many software manufacturers improved on the prior art by assigning the same key combinations for vowels, whether they appear in the stand-alone format or are vowel symbols, which are combined with consonants. However, they failed to remove the distinction between upper case and lower case letters. Character key combinations are difficult to remember if it involves case sensitivity, where one is forced to remember that a lower case character represents one Indian character, but the same character in the upper case will represent another character. In other words, one has to have a good image of a character and its association with the corresponding case-sensitive English representation, rather than to the sound of the character, to make effective use of these software programs.
Reading becomes very difficult when upper case characters are used in the middle of a word. For example, “engLisH”. In English, choice of an upper case or a lower case character is made based on readability and its visual effect. The meaning or the sound of a character does not change with the use of upper case or lower case characters. Shifting keys in the middle of a word or a syllable is a tedious task. It takes more time and can cause typing errors. Constantly shifting between upper case and lower case letters is quite boring and the resulting output also creates a poor visual image.
In addition to the mixed use of upper case and lower case characters, many transliteration programs introduced the use of special characters, such as *, @, #, ^, and ˜, including punctuation marks to spell some characters in Indian languages. Such schemes are tolerated as an input mechanism since they provide a faster means to input data into a computer when compared to keyboard mapping for a native Indian writing scheme.
Another problem with many of the transliteration programs available today is that they permit the use of alternate spelling for some characters. For instance, one vowel is spelled “uu”, “oo”, or “U”. Note that the first two are in lower case letters and the last one is in upper case. Another example is the spelling for another vowel “au”, “ou”, or “ow”. The same way one of the consonants is spelled “ph”, “f”, or “P”. This kind of alternate spelling not only creates inconsistency but also makes it difficult to store information in a sorted order. For instance, if one were to look up a word in a dictionary, he has to look up all possible spelling combinations for a word. In English, the spelling for a word is fixed even though there are some variations between American English and British English. If the word “fish” were to be spelled “phish”, it would require two entries in a dictionary. The present invention avoids this issue by assigning fixed spelling for Indian characters.
If English is used as the medium to represent a foreign language, it would be easy for the user, if that scheme implements the English conventions, such as case insensitivity, use of upper case to begin a sentence, use of upper case at the beginning of proper nouns, etc. Those who are skilled in English typing would prefer that the spelling convention does not employ the use of special keys. It would be very annoying to constantly use special keys to spell words in a different language. Even though English characters can be pronounced differently, there is a general consensus as to the pronunciation of syllables as well as words in the English language. Based on the same principle, a native English speaker is able to approximate the pronunciation of a new English word. Acknowledging these conventions in English, the present invention proposes a writing scheme for Indian languages using the English letters A through Z.
Another use of English to represent Indian languages is to read and write an Indian language without learning the proper shapes of the Indian characters or the required special character combinations to generate the characters in the Indian language. This is accomplished by introducing a new script, which closely resembles the English script and ensuring that the new script is uniform across different Indian languages.
The state of the art of using English characters to help input Indian language characters into electronic media is evident from the recent transliteration schemes, one for Devanagiri, supporting Hindi, Marathi, and Sanskrit, and the other for Malayalam. The first program is called AKSHARAMALA, which literally means alphabet. Details of this program can be found at the web site, aksharamala.com/aksharamala.html. This program requires the use of special symbols such as @ and ˜ and differentiation between upper case letters and lower case letters. The second program is introduced recently by a major Indian newspaper, Deepika, for e-mail correspondence by millions of users. This program is called EKATHU, which literally means e-mail, and the mapping between English and Malayalam characters can be found at the web site, deepika.com/ekathustd/info2.htm. The spelling in this second program, EKATHU differs from the first program, AKSHARAMALA. This program also employs case sensitivity to differentiate between different characters and uses special characters. While the former program does not support Malayalam, the latter program is exclusively created for Malayalam.
One of the earliest schemes of transliteration for Indian languages is known as ITRAN. Details of this transliteration scheme can be found at aczone.com/itrans/TRANS.TXT. This program might have introduced the use of case sensitivity to differentiate between similar sounding Indian characters. In addition to case sensitivity, this program also uses other symbols. ITRAN tried to accommodate multiple Indian languages. But, it excluded Malayalam. In short, there is not a single transliteration scheme today, which is common to the major Indian languages. These and other transliteration programs focussed on the written form of the native Indian languages, rather than providing a mechanism to represent the native Indian languages using the English writing scheme. The current invention proposes a system and method for representing Indian languages using the English writing conventions, but strictly following a phonetic spelling scheme for the Indian languages.
Such a method for representing Indian languages in English is very useful for journalists, who may be able to use this common writing scheme to write articles in different languages. Since most journalists from India are quite familiar with the English keyboard, it should be much easier for them to learn this new scheme compared to learning different keyboard settings for different languages. In fact, the new scheme does not require any new software. Transliteration software is required only when it is necessary to translate from English to the native Indian script. Further, this scheme would be very beneficial for foreign language speakers learning an Indian language. With the familiar English alphabet, learning the Indian characters would be much simpler compared to learning to write those characters in the Indian script. In addition, typing can be done much faster since this scheme is case insensitive and there are no special keys to select or special settings to be done when using this new scheme of writing.