1. Field of the Invention
The present invention relates to automatic speech recognition, and more particularly relates to the tuning of speech recognition parameters for automatic speech recognition engines.
2. Description of the Related Art
Speech recognition (or SR) systems translate audio information into text information. An SR system processes incoming speech and uses speech recognition parameters (i.e., grammars, weights, etc.) to determine the natural language represented by the speech. In an SR system, speech recognition occurs based on a score describing a phonetic similarity to the natural language options in a set of grammars. A grammar is an available set of natural language options in a particular context. A grammar can represent a set of words or phrases. When speech is recognized as one of the words or phrases in a grammar, the SR system returns the natural language interpretation of the speech.
The SR system computes scores for the options of the grammars for speech. The score of an option is based on two kinds of information: acoustic information and grammatical information. A probabilistic framework for the acoustic information defines the “acoustic score” as the likelihood that a particular option was spoken, given the acoustic properties of an utterance. The grammatical information biases some options in relation to others. In a probabilistic framework, the grammatical information is defined as a probability associated with each option. These probabilities are referred to herein as “grammar weights”, or simply “weights”. The score computed by the SR system for an option, given an utterance, is a combination of the acoustic score and the grammar weight. The SR system chooses the grammar option having the highest score as the natural language interpretation of the speech. Increasing the grammar weight of an option (and thus increasing the score of the option) therefore increases the chance of that option being chosen as the natural language interpretation of a given utterance.
An application author, which is a voice application programmer, defines the grammars for a speech engine. Grammar weights are defined by application authors in the course of the application programming process and are therefore alterable by the application author. The grammar weights of grammars may be determined (either assigned or tuned) according to a specific method to maximize the abilities of the SR system to correctly interpret speech. However, because acoustic scores are modeled by the manufacturer of the speech recognition software, the acoustic scores are typically fixed in a particular version of the speech recognition software. This can produce obstacles during maintenance, re-deployment, piloting and other phases of production. For example, if an SR system is originally deployed for recognizing residential addresses and then is later deployed for recognizing business addresses, the speech recognition parameters, which were originally hard-coded into the application, must then be re-worked or modified to recognize business addresses. This can be time-consuming and costly. It is therefore desirable for an SR system to have easy access to speech recognition parameters so as to allow for customization to different environments independent of applications.
Therefore, a need arises for a more efficient method for providing access to speech recognition parameters to speech recognition systems that are deployed in different environments.