1. Field of Invention
The present invention relates to an HMM-output-probability calculating method that can, by performing a reduced number of calculations, rapidly calculate the probability of outputting the HMM (hidden Markov Model) for use in speech recognition. The invention also relates to a speech recognition apparatus.
2. Description of Related Art
The HMM is widely used as a phoneme model for performing speech recognition. While the HMM can provide high speech-recognition performance, it has a problem in that it requires a great number of calculations. In particular, to find the probability of outputting the HMM, a large number of calculations is required. Here, concerning input vector Y at a time, when its output probability in a transition from state i to state j is represented by bij(Y), and it is assumed to obey an incorrelated normal distribution, bij(Y) can be expressed by expression (1), which is provided at the end of the specification.
At this time, input vector Y can be expressed by n-dimensional (n is a positive integer) components (LPC cepstrum, etc.) which are obtained by analyzing an input sound at each point of time (time t1, time t2, . . . ), for example, in a length of 20 msec. For example, when input vectors at times t1, t2, t3, . . . are represented by Y1, Y2, Y3, . . . , input vector Y1 at time t1 is represented by (1y1, 1y2, . . . , 1yn), input vector Y2 at time t2 is represented by (2y1, 2y2, . . . , 2yn), and input vector Y3 is represented by (3y1, 3y2, . . . , 3yn).
In expression (1), k represents the number of dimensions of input vector Y at a time and has any value of 1 to n. Also, σij(k) represents a distribution in k dimensions in the case of states i to j, and μij(k) represents an average in k dimensions in the case of states i to j.
Although output probability can be found by expression (1), there is a possibility that an underflow may occur since, when calculating using expression (1) in an unchanged form, a value obtained by the calculation is too small. Accordingly, when finding the output probability, logarithms are normally used before the finding. When the above expression (1) is expressed by a logarithm having x as a base, it can be expressed by expression (2), which is provided at the end of the specification.
In expression (2), calculation term A can be found by calculation beforehand. Thus, it is expressed by A as a constant, and also logxe existing in calculation term B can be expressed by a constant, so that by representing it by Z, expression (2) can be expressed by expression 3, which is provided at the end of the specification.
However, also in expression 3, calculation term B′ requiring a large number of calculations, in other words, [{yk−μij(k)}2/2σij(k)2]·Z exists. In particular, the above term B′ needs to be calculated for each dimension of input vector Y at time t. For example, when it is assumed that input vector Y at time t includes ten-dimensional (n=10) components, it is required that, after performing ten subtractions, ten multiplications, ten divisions, and ten additions, constant Z be multiplied, so that only in this example, the number of calculations is extremely large.
Thus, a large impediment exists to provide small-sized, light-weight, and low-priced products that can perform this large number of calculations. It is therefore impossible to perform speech recognition using the HMM as described above with such hardware.