Transliteration of one language to another language is a field of software engineering with widespread applications. In the specific area of computer algorithms, advances in this field are often necessitated by constraints on software and hardware that users have to work with. For example, while there may exist software that accepts various languages as an input, such as e-mail, word-processing etc., keyboards may often be available or affordable only in the standard American layout. For a user to enter text in a different script from the script that the keyboard is designed to type in, transliteration remains the option of choice.
There are many approaches that have been tried with respect to effective transliteration, all of which are deficient in one manner or another. The most common approach is to encode the script of the language of the document (henceforth called the target language) in the language of the keyboard. This method is cumbersome, because it requires the user to memorize various long and often unintuitive keystrokes that correspond to the target language, in order to be able to type efficiently. A second approach is to allow the user to type in a word in the target language in the keyboard language in a manner that seems closest to it phonetically. This has the advantage of being intuitive, therefore not requiring any ‘learning’ besides knowledge of the target language and the keyboard language.
However, the state of the art in this particular approach to transliteration is primitive. There are several reasons for this. The most prominent reason is that of input ambiguity. Since there may not be one correct way to phonetically represent a word belonging to the target language in the keyboard language, the mapping between phonetic input in the keyboard language and the symbolic output in the target language is a many-to-many mapping. Many contemporary transliteration systems require the user to learn unique combinations for each phonetic unit, or phoneme, of the target language, and use such combinations while entering the text phonetically in the keyboard language. Contemporary systems reject any other phonetic representations of the target text.
Another problem in existing transliteration schemes is the problem of missing phonemes. The symbols and characters in the keyboard language may not be able to represent completely all the phonemes of the target language. Thus, a user can only enter an ‘approximately phonetic’ version of the text. This approach has two limitations. Either the user will have to learn a letter combination that is distinct and which maps to the phoneme in question, or the sequence of characters entered by the user may clash with a different phoneme in the target language. The former approach is non-intuitive and requires training on the part of the user. The latter approach may cause an inaccurate transliteration of the text.
To complicate the situation further, there may already be native conventions for translation of one language to another. A user must be able to adhere to this convention and expect accurate transliteration, while at the same time; an untrained user must also be able to expect accurate transliteration from an intuitive transliteration method. Further, many languages often borrow words from one another, and it is not uncommon to find a word in the target language that has been borrowed from the keyboard language. In such a situation, the user may confidently assume that he or she may spell the word in the manner that it is spelt in its native language, though such a spelling may not be phonetically accurate. A transliteration system is expected to handle this situation also accurately.
The user is often prompted to choose between various alternatives from a dictionary for a word that he or she has transliterated. This causes the typing process to be considerably slower and more cumbersome than, for instance, typing directly in the target language on a modified keyboard.
There is a need, therefore, for a fast, efficient and accurate method of automatically transliterating text that is phonetically created in one language to another language. This is the need that this invention attempts to address.
Some of the prior arts related to transliteration systems are disclosed below. These prior arts appear to be very close to the present invention, however each of them differs from the instant invention. The distinct feature is explained at the end of the Prior arts.
Document D1: U.S. Pat. No. 5,432,948 “Object-Oriented Rule-Based Text Input Transliteration System”
This document discloses the invention related to a computer implemented system and method utilizing rules instantiated in objects of an object-oriented operating system to transliterate text as it is input into a computer is disclosed. A number of transliterator objects are created in the storage of the computer, each one of the transliterator objects include transliteration rules arranged in the storage in a preferred order. Each of the transliteration rules contain a first language character string, a second language character string, and logic for comparing the first language character string in each of the transliteration rules to a text string that is entered into a computer to determine a subset of transliteration rules which match the entered text string. The entered text is displayed on a computer display as it is input into a computer and a particular one of the plurality of transliterator objects' logic is utilized in response to the preferred order for selecting one of the subset of transliteration rules and applying it to the first text string to display the second language character string of the selected transliteration rule on the display.
Document D2: U.S. Pat. No. 5,640,587 “Object-Oriented Rule-Based Text Transliteration System”
This document discloses a computer system transliterates a text string from a first language to a second language using transliterator objects, each having a set of transliteration rules arranged in a preferred order. Each of the transliteration rules, in turn, has a test string and a replacement string and the transliterator object includes a method for comparing the test string in each of the transliteration rules to each of the characters in the text string to determine a subset of transliteration rules which exhibit a match. Using the preferred order, one of the subset of transliteration rules is selected and the replacement string of the selected transliteration rule is substituted for the test string in the text string.
Further, the invention discloses method operable on a computer system having a memory, an input device and a display device, the method displaying on the display device a text string including one or more characters in response to a character being entered from the input device at an insertion point in the text string by                (a) creating a plurality of transliteration rules in the memory, each of the plurality of transliteration rules having a source string comprised of a plurality of characters and a result string comprised of at least one character;        (b) receiving a character entered on the input device;        (c) inserting the entered character into the text string at the insertion point and moving the insertion point after the inserted character;        (d) sequentially comparing source strings in the plurality of transliteration rules to text string characters preceding the insertion point to detect a match of one transliteration rule source string;        (e) redisplaying the text string on the display with result string characters in the one transliteration rule substituted for source string characters found in the text string; when a match is detected in step (d); and        (f) redisplaying the display text string on the display with the entered character inserted at the insertion point when no match is detected in step (d).Document D3: United States Patent Application 0050043941 “Method, Apparatus, and Program for Transliteration of Documents in Various Indian Languages”        
This invention relates to transliteration mechanism is provided that allows a user to view a text in one Indian language, to highlight a word or phrase, and to easily transliterate the selected word or phrase into a target language or script. The mechanism may be an application, an applet, or a plug-in to another application, such as a Web browser. The target language and/or script may be stored in a user profile. Preferably, the source language may be any known Indian language in any known script and a method for transliteration of text in Indian languages, the method comprising: identifying a selected portion of a text in an Indian language; and transliterating the selected portion into a target script to form transliterated text, wherein the target script is identified in a user profile.
Though, all the documents D1 to D3 are related to Transliteration System and method thereof, they differ in the methodology used to derive transliteration from one language to another language (source language to target language).
Further, no documents D1 to D3 disclose the transliteration using decision tree based algorithm or mechanism. The crux of our invention lies in building producer rule and Special Rule and thereafter using the same rules for decision making.
In addition to this, our invention is language independent, whereas Documents D1 and D2 are language specific.
Document D1 is based on a number of transliterator objects that are created in the storage of the computer; each one of the transliterator objects include transliteration rules arranged in the storage in a preferred order which is not in our case.
Document D2 is mainly based on a Transliterator object which is used to perform transliterations. Input transliterators are composed of a set of context-sensitive rules. Hence, this technology is based on rule based transliteration mechanism, which is not in our case.
Document D3 wherein prima facie appears to be similar as our invention but it neither discloses the method used in transliteration nor transliteration is performed on the fly. The document D3 primarily discloses the technology of transliteration only for the selected portion of text, wherein it does not support for dynamically inputting text. It also failed in disclosing the transliteration using decision tree based algorithm or mechanism.