1. Field of Invention
The present invention relates generally to data analysis and, more particularly, to automatically improving a voice recognition system.
2. Related Art
Voice communication devices, such as telephones, traditionally have been used for mere voice communications or for accessing information using touch-tone dialing. With advancements in communications technology, today various types of information can be accessed using voice recognition systems that translate voice commands into system commands for data retrieval from an electronic system.
These voice recognition systems, however useful, are not perfect. Typically, a voice recognition system cannot recognize all voice commands or utterances spoken by the user. That is, a voice recognition system only recognizes utterances that are included within a recognition grammar. A recognition grammar defines the boundaries of utterances that can be recognized by a voice recognition system. A user is limited only to certain terms while interfacing with a voice recognition system in order to be understood.
In addition, voice recognition systems typically only understand certain common accents represented by respective acoustic models. For example, acoustic models have been created to recognize the following regional accents of the United States: Boston, New York, Southern, Texas, Midwest & West Coast. Thus, a user who speaks in a tone or accent other than that recognized by the recognition system, may have difficulty communicating with the system. That is, the system may fail to recognize a word or the phrase included in its grammar if the word or a phrase is spoken in a way that acoustically does not match what is expected by the system. For example, a voice recognition system which has acoustic models for various U.S. regional accents only may not properly recognize utterances which are spoken with an Australian accent. On the other hand, one utterance that is acoustically similar to another may be mistakenly recognized by the system as the latter. This leads to improper recognition and false rejects, thereby reducing system efficiency and contributing to user frustration.
To overcome the above-mentioned problems, the recognition results of a voice recognition system may be monitored for errors. According to previously developed techniques, a human operator inspects and analyzes any detected errors in order to hypothesize a solution. The human operator is then required to manually implement the solution into the system, and thereafter, the system is tested and monitored to verify that the solution has actually improved the system. However, due to the large number of errors that may occur over multiple sessions with many users, a human operator may not be able to effectively and efficiently handle the tasks of monitoring errors, hypothesizing, and implementing solutions. This is a very time consuming process, especially when performed by a human operator. Also it requires the human operator to have “expert knowledge” in the field to be able to make effective changes to the system.