1. Field of the Invention
The present invention relates generally to the method of speech recognition systems, and more particularly, to methods and systems for evaluating fitness of a grammar to be used in a speech recognition system.
2. Description of the Related Art
Implementing robust and effective techniques for system users to interface with electronic devices is a significant consideration of system designers and manufacturers. Voice-controlled operation of electronic devices may often provide a desirable interface for system users to control and interact with electronic devices. For example, voice-controlled operation of an electronic device may allow a user to perform other tasks simultaneously, or can be advantageous in certain types of operating environments. In addition, hands-free operation of electronic devices may also be desirable for users who have physical limitations or other special requirements.
Hands-free operation of electronic devices may be implemented by various speech-activated electronic devices. Speech-activated electronic devices advantageously allow users to interface with electronic devices in situations where it would be inconvenient or potentially hazardous to utilize a traditional input device. However, effectively implementing speech recognition systems creates substantial challenges for system designers.
In the field of speech recognition systems a speech recognition system inputs an audio stream that is filtered to extract and isolate sound segments that are speech. The speech recognition engine then analyzes the speech sound segments by comparing them to a defined pronunciation dictionary, grammar recognition network and an acoustic model.
Sub-lexical speech recognition systems are usually equipped with a way to compose words and sentences from more fundamental units that model the speech waveforms. For example, in a speech recognition system based on phoneme models, pronunciation dictionaries can be used as look-up tables to build words from their phonetic transcriptions. Also, explicit rules for word combination are given to build sentences from words. The rules for sentence construction is regarded as “recognition grammar.”
The complexity of the recognition grammar depends on the nature of the application that is to be recognized. For instance, some simple command-like applications will require isolated-word grammars while some dialog-like applications will require more complex sentence construction. Regardless the complexity of the application, the application developer needs to carefully specify the grammar and needs to refine the grammar in order to assure completeness (i.e., that the grammar covers all the sentences required for the application) and to avoid over-generation (i.e., to ensure that the grammar does not allow for generation of unexpected sentences that are not understood by the application). This can be particularly time-consuming, even for the more experienced application developer.
Regardless the amount of effort that the developer dedicates to building the grammar, it is likely that the grammar will include several areas in which the speech recognition system may produce errors. This is because different words with different meanings, and associated to different actions, are acoustically similar, or because a particular combination of words is very close to another word combination that represents a different meaning or action. This makes it difficult for the speech recognition system to differentiate between words, thereby triggering recognition errors.
Thus, the application developer is tasked with considering potential sources of confusion with the grammar and trying to eliminate the potential sources of confusion by attempting to avoid placement of confusable words in interchangeable locations of the grammar. However, this can be particularly challenging when the set of possible word combinations within the grammar is too long for the developer to manually explore with sufficient detail and accuracy. Therefore, it is desirable to have a systematic way to automatically evaluate a grammar to identify placement of confusable words in interchangeable locations within the grammar.