1. Field of the Invention
This invention relates generally to electronic speech recognition systems, and relates more particularly to a system and method for speech verification using a confidence measure.
2. Description of the Background Art
Implementing an effective and efficient method for system users to interface with electronic devices is a significant consideration of system designers and manufacturers. Voice-controlled operation of electronic devices is a desirable interface for many system users. For example, voice-controlled operation allows a user to perform other tasks simultaneously. For instance, a person may operate a vehicle and operate an electronic organizer by voice control at the same time. Hands-free operation of electronic systems may also be desirable for users who have physical limitations or other special requirements.
Hands-free operation of electronic devices may be implemented by various speech-activated electronic systems. Speech-activated electronic systems thus advantageously allow users to interface with electronic devices in situations where it would be inconvenient or potentially hazardous to utilize a traditional input device. Electronic entertainment systems may also utilize speech recognition techniques to allow users to interact with a system by speaking to it.
Speech-activated electronic systems may be used in a variety of noisy environments such as industrial facilities, manufacturing facilities, commercial vehicles, passenger vehicles, homes, and office environments. A significant amount of noise in an environment may interfere with and degrade the performance and effectiveness of speech-activated systems. System designers and manufacturers typically seek to develop speech-activated systems that provide reliable performance in noisy environments.
In a noisy environment, sound energy detected by a speech-activated system may contain speech and a significant amount of noise. In such an environment, the speech may be masked by the noise and be undetected. This result is unacceptable for reliable performance of the speech-activated system.
Alternatively, sound energy detected by the speech-activated system may contain only noise. The noise may be of such a character that the speech-activated system identifies the noise as speech. This result reduces the effectiveness of the speech-activated system, and is also unacceptable for reliable performance. Verifying that a detected signal is actually speech increases the effectiveness and reliability of speech-activated systems.
A speech-activated system may have a limited vocabulary of words that the system is programmed to recognize. The system should respond to words or phrases that are in its vocabulary, and should not respond to words or phrases that are not in its vocabulary. Verifying that a recognized word is in the system""s vocabulary increases the accuracy and reliability of speech-activated systems.
Therefore, for all the foregoing reasons, implementing an effective and efficient method for a system user to interface with electronic devices remains a significant consideration of system designers and manufacturers.
In accordance with the present invention, a system and method are disclosed for speech verification using a confidence measure. In one embodiment, the invention includes a speech verifier that compares a differential score for a recognized word to a predetermined threshold value, where a recognized word is the word model that produced the highest recognition score. The speech verifier preferably includes a word model for each word in a vocabulary of the system.
In one embodiment, a single threshold is used for each word in the vocabulary. In another embodiment, each word model has an associated threshold, so that a differential score for a recognized word is compared to a unique threshold associated with that word. To determine a threshold value, a set of test utterances may be compared with each model. A differential score for each utterance and each model may then be calculated. A minimum differential score for each model is determined, and the minimum differential score is utilized a the threshold value for each word. In the foregoing single threshold embodiment, the single threshold may preferably correspond to a minimum of the minimum differential scores. In a further embodiment, pairs of confused words in the vocabulary may be dealt with separately. Confused words are two phonetically-similar words. A speech recognition system may often identify a confused word as the other word in the pair. If a confused word is the recognized word, then the speech verifier may compare the differential score to a threshold that depends on the word model that produced the next-highest recognition score.
Different values for the various thresholds may maximize rejection accuracy or recognition accuracy. A trade-off between rejection accuracy and recognition accuracy may be made by utilizing an intermediate threshold value between a minimum threshold value and a maximum threshold value. A maximum threshold value may be determined by comparing a set of out-of-vocabulary test utterances with each word model, which generates a differential score for each out-of-vocabulary test utterance. A maximum differential score may be determined for each vocabulary word, which then may be utilized as a maximum threshold value.
The present invention thus efficiently and effectively implements speech verification using a confidence measure.