Today, interactive technologies play a key role for improving customer service. Interactive technologies like IVR (Interactive Voice Response) accept verbal user input and/or request and provide pre-recorded or dynamically generated output in response to user's request.
Typically, IVR applications use speech recognition systems to recognize and convert either a spoken word or a sequence of spoken words to machine readable form for further processing and/or answering a user query. Typically, these speech recognition systems are deployed for a particular language, thus, when the same system has to be deployed for a different language, one has to port the existing system to enable it to understand the new language, which is equivalent to building a fresh application. Most of the existing systems are deployed in English due to:                (a) wider acceptability of the language; and        (b) the ready availability of information and other resources in English.        
However, with increasing acceptability of speech based solutions in various countries, where the native language is other than English, there is an urgent need to convert existing speech recognition based applications in a source language, for instance English, to a target language for instance, Hindi.
Typically, an existing speech recognition based solution requires the following components:                a Speech Recognition (SR) engine with acoustic models for acoustic recognition;        a pronunciation lexicon of words which have to be recognized;        a speech grammar or language model; and        speech prompts which are used to evoke responses from users i.e. prompt users to submit their query.        
The first three components work in tandem to convert the spoken speech to text, while the fourth component helps the existing speech recognition based solution to communicate with users. Typically, converting the existing speech recognition based solution from a source language to a target language needs these four components to be ported to the target language.
Although, acoustic models are tuned for a particular language, source acoustic models can be used to recognize speech in another language with decent accuracy if the other two components, namely, the pronunciation lexicon and the speech grammar are addressed adequately in the target language.
Essentially, converting the speech recognition based solution from one language to another necessitates creation of a new pronunciation lexicon for the target language that contains all words to be recognized by the speech recognition based solution and also a speech grammar model in the target language. Additionally, prompts in the source language have to be converted into prompts in the target language.
These modifications for porting the existing speech recognition based solution in the source language into the target language requires efforts equivalent to building an entirely new speech recognition based solution.
There have been various attempts in the prior art to develop systems which will enable the easy portability of applications from one language to another.
Particularly, U.S. Pat. No. 7,406,417 discloses a method for conditioning a database for automatic speech processing. The document discloses a neural network that can be trained for synthesizing or recognizing speech with the aid of a database produced by automatically matching graphemes and phonemes. First, graphemes and phonemes are matched for words which have the same number of graphemes and phonemes. Next, graphemes and phonemes are matched for words that have more graphemes than phonemes in a series of steps that combine graphemes with preceding phonemes. Then, graphemes and phonemes are matched for words that have fewer graphemes than phonemes. After each step, infrequent and unsuccessful matches made in the preceding step are erased. After this process is completed, the database can be used to train the neural network and graphemes, or letters of a text can be converted into the corresponding phonemes with the aid of a trained artificial neural network.
Further, United States Patent Application 2005197835 discloses method and apparatus for generating acoustic models for speaker independent speech recognition of foreign words uttered by non-native speakers. The document discloses acoustic models for speech recognition which are automatically generated and utilize trained acoustic models from a native language and a foreign language. A phoneme-to-phoneme mapping is utilized to enable the description of foreign language words with native language phonemes. The phoneme-to-phoneme mapping is used for training foreign language words, described by native language phonemes on foreign language speech material. A new phonetic lexicon is created containing foreign language words and native language words transcribed by native language phonemes. Robust native language acoustic models can be derived utilizing foreign language and native language training material. The mapping may be used for training a grapheme to phoneme transducer (i.e., foreign language to native language) to generate native language pronunciations for new foreign language words.
Furthermore, United States Patent Application 2009150153 discloses grapheme-to-phoneme conversion using acoustic data. The document discloses the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting grapheme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
Additionally, World Intellectual Property Organisation document number 2009/150591 discloses a method and device for the generation of a topic-specific vocabulary and computer program product. The document discloses a method for the computer-aided generation of a topic-specific vocabulary from public text. The steps followed as disclosed in this document are: automatic selection of a language and topic-specific text; automatic generation of vocabulary entries each comprising a word together with a phonetic transcription on the basis of the selected text; automatic generation of the vocabulary entries is done employing a grapheme structure-based classification of the vocabulary entries, to classify the vocabulary entries according to a number of predetermined types; vocabulary entry type-specific grapheme-to-phoneme conversion; and to obtain phonetic transcriptions for words.
However, the aforementioned documents are not suitable for porting existing speech recognition solutions to plurality of target languages with minimum changes in the existing deployment. Therefore, there is a need for a system which will enable the existing applications to be quickly ported and/or modified to work in multiple target languages by reusing the speech recognition engine of the existing application.