1. Technical Field
Several aspects of the present invention relate to speech recognition systems, speech recognition programs, recording mediums, and speech recognition methods.
2. Related Art
According to a method generally used for speech recognition, an input pattern that consists of parameters representing features of an inputted utterance obtained by analyzing the utterance is processed into data, the unknown input pattern is compared with registered patterns (for example, dictionary data) of a plurality of utterances compiled in advance into a database through pattern matching, and a registered pattern among the dictionary data with a greater likelihood is outputted as a recognition result. Here, the likelihood is a parameter that represents the likelihood of a candidate speech recognition result, and is obtained, associated with a Hidden Markov model that statistically models spectral fluctuations and temporal fluctuations of an utterance by numerous training samplings. Note that, in many cases, an inputted speech may be divided into a plurality of frames and processed.
Laid-open Japanese patent application HEI 10-207486 (Patent Document 1) describes a speech recognition method in which likelihoods are obtained using Hidden Markov Models, likelihood differences that are differences between the likelihood of a speech recognition result at a first rank and the likelihood of each of respective speech recognition results at ranks including a second rank and below are obtained, and only those of the likelihood differences whose speech recognition is recognized as having been properly performed are selected based on a predetermined likelihood difference judgment threshold as candidates of correct recognition results.
However, in speech recognition that performs matching, using Hidden Markov Models, the slower the rate of speech (the utterance speed), the greater the number of frames becomes by the amount slowed down, such that likelihoods of recognition results tend to have greater values. Accordingly, likelihood differences tend to spread wider. As a result, the slower the utterance speed, the greater the values of likelihood differences become, such that there is a tendency to judge wrong speech recognition results of the first rank as correct results.
By setting the likelihood difference judgment threshold higher, the reliability in answers can be secured even when the utterance speed is slow but, in reverse, the rate of recognition lowers when the utterance speed is faster.