1. Field of the Invention
The present invention relates to speech recognition, and specifically to methods for assigning and training grammar weights for a speech recognition system.
2. Discussion of the Related Art
Automatic speech recognition (ASR) systems translate audio information into text information. Specifically, an utterance (i.e. audio information) made by a user is input to the ASR system. The ASR system interprets the utterance based on a score describing a phonetic similarity to the natural language options in a set of active grammars. An active grammar is an available set of natural language options (options) in a particular context. The different ways an option might be spoken are defined as option variants. For example, in the context of movies, an active grammar can represent the names of presently playing movies. Each option in the movie grammar is a tag corresponding to a movie name. For each option (e.g. the tag for the movie name “Mission Impossible: 2”), the grammar might include option variants for recognizing “mission impossible”, “mission—impossible” (run-on of the two words), and “mission impossible two”. These option variants represent the different ways a user might say the name of the movie “Mission Impossible: 2”. Thus, each of these option variants corresponds to a single natural language option, the tag corresponding to the movie “Mission Impossible: 2”. As a result, when an utterance is recognized as the option variant “mission impossible”, then the ASR system returns the option for “Mission Impossible: 2” as the natural language interpretation of the utterance.
The ASR system computes scores for the options of the active grammars for each utterance. The score of an option is based on two kinds of information: acoustic information and grammatical information. A probabilistic framework for the acoustic information defines the “acoustic score” as the likelihood that a particular option was spoken, given the acoustic properties of an utterance. The grammatical information biases some options in relation to others. In a probabilistic framework, the grammatical information is defined as a probability associated with each option. These probabilities are referred to herein as “grammar weights”, or simply “weights”. The score computed by the ASR system for an option, given an utterance, is a combination of the acoustic score and the grammar weight. In a probabilistic framework, the logarithm of both the grammar weight and the acoustic score are added. While scores discussed herein relate to a probabilistic framework with all scores defined in the logarithmic domain, the concepts described herein can be applied to other ways of merging the acoustic information with the grammatical information as well.
The ASR system chooses the active grammar option having the highest score as the natural language interpretation of the utterance (i.e. recognized result). Increasing the grammar weight of an option (and thus increasing the score of the option) therefore increases the chance of that option being chosen as the natural language interpretation of a given utterance by the ASR system.
In voice applications, an application author defines the active grammars for each portion of the application. An application author is a voice application programmer, and typically has no training as a speech scientist. Grammar weights of variants are defined by application authors in the course of the application programming process and are therefore alterable by the application author. However, because acoustic scores are modeled by the manufacturer of the speech recognizer (the recognition engine of the ASR system), the acoustic scores are typically fixed in a particular version of a speech recognizer.
The grammar weights of options in active grammars may be determined (either assigned or tuned) according to a specific method to maximize the abilities of the ASR system to correctly interpret utterances. It is often impractical to obtain enough utterance data to assign grammar weights directly from utterance frequency. Additionally, directly weighting from utterance frequency only indirectly minimizes the number of recognition results. One current method for determining grammar weights of options requires a highly trained speech scientist to review error and utterance frequency data for an ASR system and to alter grammar weights of options based on this review. Ideally, grammar weights of options are derived from large amounts of data to make them as accurate as possible. Moreover, even relatively simple or small grammars having few options typically have many variants of each option. Therefore, this review process is an enormous task for one person. To further complicate this process, there are a limited number of speech scientists in the industry, thereby significantly increasing the cost of the review. Finally, relying on a subjective, human review introduces the possibility of error, and at the very least, inconsistent analysis based on different interpretations of the data.
Therefore, a need arises for a method of, and a system for, efficiently determining the grammar weights of options in grammars for an ASR system.