1. Field of the Invention
The present invention relates to a speech recognition technique. More particularly, the present invention relates to a modifying method for a speech model.
2. Description of Related Art
With development of speech recognition technique, various electronic devices such as television, audio devices etc. may be operated via speeches. A user may operate these electronic devices via recognizable speech instructions of a speech recognition system. Besides applying to the electronic devices, the speech recognition technique is widely applied in related realms such as speech input, identity recognition etc.
Errors occurred to the speech recognition system includes substitution error, deletion error and insertion error. Referring to table 1, if the speeches of the user are respectively “A, B, C”, and a recognition result of the speech recognition system is “D, B, C”, such error is referred to as the substitution error. If the speeches of the user are respectively “A, B, C”, and the recognition result of the speech recognition system is “A, C”, such error is referred to as the deletion error. If the speeches of the user are respectively “A, B, C”, and the recognition result of the speech recognition system is “A, B, C, D”, such error is referred to as the insertion error.
TABLE 1Types of recognition errorsSubstitution errorCorrect answerABCRecognition resultDBCDeletion errorCorrect answerABCRecognition resultACInsertion errorCorrect answerABCRecognition resultABCD
To solve the recognition errors, a representative speech model should be applied by the speech recognition system for comparing with the speeches of the user, so as to correctly recognize the speeches of the user. To obtain the representative speech model, a speech database may provide a large amount of speeches to the speech recognition system for modifying (or training) the speech model, wherein the large amount of speeches is a collection of speeches of a plurality of people, so as to improve a maximum likelihood of the speech model. Then, a discriminative training is applied for modifying the speech model to improve a discrimination of the speech model. Since the discrimination of the speech model greatly relates to a recognition rate of the speech recognition system, if the discrimination of the speech model is improved, the recognition rate of the speech recognition system then may be improved.
Presently, a preferred and a commonly used modifying method for the speech model is to modify the speech model based on a sequence generated by a fixed candidate sequence generator, for example, the modifying methods disclosed by U.S. Pat. No. 5,606,644 and U.S. Pat. No. 5,579,436. However, the fixed candidate sequence generator may easily cause an uneven distribution of the error types of the generated sequences. Therefore, the speech model obtained by a conventional training method is not desirable. Detailed description will be made with reference of FIG. 1.
FIG. 1 is a curve diagram illustrating a training process of a speech model according to sequences with insertion errors generated by a conventional fixed sequence generator. Table 2 are experimental data of a speech model modified by sequences with insertion errors and generated by the conventional fixed sequence generator. Referring to FIG. 1 and table 2, curves 101, 102 and 103 are respectively error rate curves of the insertion error, the substitution error and the deletion error. When modifying times of the speech model reaches 20, the curves 101, 102 and 103 are convergent. According to the table 2, it is obvious that the conventional technique may effectively reduce the insertion error rate. However, the convention technique cannot reduce the substitution error rate, and even may worsen the deletion error rate.
TABLE 2Experimental data of a speech model modified by sequences withinsertion errors and generated by the conventional fixedsequence generator.InsertionDeletionDistributionDigitSentenceerror rateerror rateerror rateaccuracyaccuracyBaseline4.061.521.6492.7974.67Conven-1.331.701.6495.3379.67tionaltechniqueError67.24−11.840.0035.2319.74reductionrate
In summary, in the conventional technique, the signal fixed sequence generator is used for generating the sequences to modify the speech model. Since distribution of the error types of the generated sequences is uneven, only the error rate of a part of the error types is reduced, and meanwhile the error rates of the other part of the error types are worsened.