1. Field of the Invention
The invention generally relates to speech recognition technology. More particularly, the invention relates to systems and methods for tuning and testing of a speech recognition system.
2. Description of the Related Technology
Speech recognition generally pertains to technology for converting voice data to text data. Typically, in speech recognition systems a speech recognition engine analyzes speech in the form of audio data and converts it to a digital representation of the speech. One area of application of speech recognition involves receiving spoken words as audio input, decoding the audio input into a textual representation of the spoken words, and interpreting the textual representation to execute instructions or to handle the textual representation in some desired manner.
One example of a speech recognition application is an automatic call handling system for a pizza delivery service. The call handling system includes a speech recognition system that receives audio input from a customer placing an order for delivery. Typically, the speech recognition application prompts the customer for responses appropriate to the context of the application. For example, the speech recognition system may be configured to ask: “Would you like a small, medium, or large pizza?” The customer then provides an audio input such as “large,” which the speech recognition system decodes into a textual description, namely “large.” The speech recognition system may also be configured to interpret the text “large” as a command to prompt the user with a menu list corresponding to toppings options for a large pizza.
The performance quality of a speech recognition system depends on, among other things, the quality of its acoustic model and the appropriateness of its dictionary. Since an acoustic model is based on statistics, the larger the amount of correct data supplied to the model's training, the more accurate the model is likely be in recognizing speech patterns. Moreover, the training of an acoustic model typically requires accurate word and noise transcriptions and actual speech data. However, in practice, it is often difficult to produce accurate transcriptions of the speech data.
A typical dictionary provides one or more pronunciations for a given word, syllable, phoneme, etc. If the pronunciations accurately reflect how a word is pronounced, then the acoustic model has a better chance of recognizing the speech input. However, if the pronunciations are poor they can impair the acoustic model's ability to recognize words.
Improving the performance of a speech recognition application by improving the acoustic model or the dictionary is usually performed while the application is off-line, i.e., not in actual use in the field. Improvements may be attempted by adding to and/or modifying the pronunciations in the dictionary, and/or by providing transcriptions which often require a long and labor-intensive process. In some cases, this process can take anywhere from a week to months.
Speech recognition applications such as the one described above benefit from testing for satisfactory performance not only at the development stage but also during actual use of the application in the field. Moreover, the speech recognition system can benefit from in-field adjustments (“tuning”) to enhance its accuracy. However, known speech recognition systems do not incorporate a convenient testing facility or a tool for periodic, incremental adjustments. Thus, there is a need in the industry for systems and methods that facilitate the tuning and testing of speech recognition systems. The systems and methods described herein address this need.