With advancement in computer technology, handheld devices have been integrated with voice based intelligent systems [alternatively referred as system] to help users in performing different tasks or operations such as scheduling appointments, booking movie tickets, online shopping and the like.
Generally, speech parameters of every individual are different. Therefore, words uttered by every user while performing such tasks or operations may differ largely due to the impact of corresponding mother tongue influence and state of user. The ambiguity in pronunciation of words is due to vocal chords, region influence and medical profile of the user. In such situations, if the user provides an input to the system, the system may misinterpret the words spoken as some other words. Thus, resulting in a reduced user experience and the action performed by the system or the response provided by the system for the user input may not be efficient. As an example, consider a scenario where the user provides a query to the system. The query may be:    Query: “Can you book me a ticket to kodaikanal”?
The system may misinterpret the word “Kodaikanal” as some other name and hence recommends a response or provides options which are out of context to the user. Because of this, the user may have to continuously provide the query unless the system detects all the words correctly. This may lead to a wastage of time and a reduced user experience.
The existing methods address the above-mentioned problems through methods such as speech recognition method, speech to text conversion method, regional based accent neutralization and the like. In the speech recognition method, the speech parameters are extracted based on which the response is provided to the user. But the method fails to correctly interpret words uttered by the user based on the speech parameters. In the speech to text conversion method, the system fails to convert similarly placed words and interpret words incorrectly. Thus, attempting to recognize words by converting speech into text predicts more words and due to more number of words the system may misinterpret the words uttered by the user. In the regional based accent neutralization, based on influence of mother tongue, the system would correct the word. But the system may not consider parameters which affects the pronunciation of words being uttered by the user when providing the query and thereby affecting the response being provided by the system.