The invention relates to a method of automatically recognizing speech utterances, in which a recognition result is evaluated by means of a first confidence measure and a plurality of second confidence measures determined for a recognition result is automatically combined for determining the first confidence measure.
The method according to the invention can be particularly used in the field of xe2x80x9ccommand and controlxe2x80x9d applications, in which electric apparatuses are controlled by means of single speech utterances (usually single words). This method is also applicable in the field of dictations.
By evaluating speech recognition results by means of a confidence measure (=reliability measure) it is decided whether a recognition result represents the actually presented speech utterance in a sufficiently reliable manner for the relevant application. To this end, the determined confidence measure is compared with a threshold. The user may be required to repeat his speech utterance.
The basic idea of combining a plurality of confidence measures for determining a resultant confidence measure is known from T. Kemp, T. Schaaf, xe2x80x9cConfidence measures for spontaneous speech recognitionxe2x80x9d, Proc. ICASSP, vol. II, pp. 875-878, 1997. Different combination possibilities are indicated which are, however, not explained individually.
It is an object of the invention to reduce the resultant error rate in the assessment of the correctness of a recognition result in the method described above.
This object is solved in that the determination of the parameters determining the combination of the second confidence measures is based on a minimization of a cross-entropy-error measure.
In this way, particularly parameter values are obtained which serve as weights in a linear combination of the second confidence measures so as to obtain the first confidence measure.
For a further reduction of the error rate, the method is characterized in that the confidence measure is adapted by means of a user and/or speech utterance-specific offset before comparison with a threshold value serving as the decision limit.
When comparing the confidence measure, which may also consist of a combination of confidence measures, with a threshold value, an automatic adaptation to given applications is simply possible without having to adapt the threshold value.
The invention also relates to a speech recognition system comprising processing units for evaluating a recognition result by means of the method described hereinbefore.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiment(s) described hereinafter.