1. Field of the Invention
The present invention relates to the technical field of utterance verification and, more particularly, to a method and a system for utterance verification which are adapted to a noisy environment.
2. Description of Related Art
Conventionally, in the field of data processing, an utterance verification technique is employed to verify the correctness of a candidate string which is obtained through conducting speech recognition on a speech segment. Therefore, correct action can be taken on according to the candidate string which is verified and regarded as the correct answer. For example, in voice dialing system, a voice input for a set of telephone number is requested, a digit-string will be recognized and verified for the input speech, and the recognized digit-string will be dialed if it is verified and regarded as the correct answer.
In many well-known utterance verification techniques, the most widely employed techniques are the decoding based approach and a hypothesis testing based approach.
FIG. 1 is a block diagram illustrating a decoding based utterance verification technique. As shown, a word “Hi”, denoted by two phonetic symbols “h” and “ai”, represented by an input speech 51 has been recognized. In most decoding based systems, “Hi” is taken as a unit for decoding and parameters used for decoding are calculated for the word “Hi”. Besides, more than one parameters 52 are usually used in this approach, such as acoustic score 521, language model score 522, N-best information 523, etc included in the parameter set 52. A decoder 53 is then activated to combine these scores for obtaining a verification score 54. Finally, the verification score 54 is compared with a predetermined threshold to decide recognized word “Hi” should be accepted or not. This approach can be implemented by LDA (Linear Discriminative Analysis), decision tree analysis, or neural network. However, various and complicated parameters are required for this approach. It is an extra and time-consuming effort for many speech applications.
FIG. 2 is a block diagram illustrating a hypothesis testing based utterance verification technique. As shown, a word “Hi” represented by an input speech 61 has been recognized. And, the input speech is segmented into sub-word segments of “h” and “ai”. Verification model 621 (623) and anti-model 622 (624) of sub-word “h” (“ai”) are used to test the sub-word segment of “h” (“ai”). And the resulted log likelihood ratio is regarded as the test score 631 (632) of sub-word “h” (“ai”). The verification score 64 of “Hi” is obtained through combining test scores 631 and 632. Finally, the verification score 64 is compared with a predetermined threshold to decide whether the recognized word “Hi” should be accepted or not. However, the hypothesis testing based utterance verification technique requires verification model and anti-model for each sub-word. And, two tests are required for each sub-word segment. System load will be significant increased by using this approach.
Moreover, both the decoding and hypothesis testing approaches are only applicable to noiseless environment. Hence, verification performance will be degraded greatly in noisy environment. As a result, reliability of recognition system will be poor.
Therefore, it is desirable to provide novel method and system for utterance verification in order to mitigate and/or obviate the aforementioned problems.