This specification relates to generating transliterations, for example, English transliterations of Indic words.
Transliteration is the process of mapping text in a source language writing system to text in a different destination language writing system. For example, to transliterate a word in an Indic language, such as Hindi, into English, each letter from a sequence of Hindi characters that constitute the Hindi word is mapped into a corresponding English character. Thus, to transliterate the Hindi word, “” into the English writing system, each character of the Hindi word, i.e., “,” “,” and “,” is mapped into a corresponding character in English, i.e., “ka,” “vi,” and “ta, respectively.” The English characters are arranged in a sequence corresponding to the sequence of the Hindi characters to form the transliterated English word. Similarly, a reverse mapping can be used to generate “” from “kavita”.
Phonetic transcription, on the other hand, is the process of mapping a sound produced when a word in a source language is spoken, to text to be read as text written in a destination language. The source and destination languages can be the same or can be different from each another. For example, when the English word “rendezvous” is transcribed into English text, the resulting text could represent the word's sound as “rahn-dey-voo.” Note that “English text” refers to transcription text intended to be pronounced by an English speaker, and not to text that is made up of English words. More usefully, a transcription can be represented in the International Phonetic Alphabet (IPA), which is a written system of symbolization of sounds occurring in spoken language that was devised and standardized by the International Phonetic Association. The general principle of the IPA is to provide one symbol for each distinctive sound. For example, the IPA representation of “rendezvous” is “rndavu”.