The present invention generally relates to a speech recognition method and an apparatus capable of implementing the same, and in particular to pattern matching between an unknown input pattern with a known reference pattern. Further, the present invention relates to a method of creating a reference speech pattern and an apparatus capable of implementing the same.
Currently, various speech recognition methods have been proposed. A dynamic programming method has been widely used as one of speech pattern matching methods. In the dynamic programming method, an unknown speech pattern is divided into a plurality of frames, and a local distance between a speech portion included in each of the divided frames and a corresponding speech portion of a reference pattern is calculated. Then the local distances thus obtained are sequentially accumulated. However, the dynamic programming method has a disadvantage that an amount of calculation on matching between an unknown input pattern and a known reference pattern is enormous. This disadvantage results from the fact that a number of lattice points at which local distances and accumulated distances are to be calculated, is proportional to a product of a number of frames of the input pattern and a number of frames of the reference pattern. In addition, a reference pattern must have a frame length corresponding to the whole of the speech section. It is to be noted that normally one speech section contains information which is not necessarily important to obtain a recognition result with a high accuracy.
From the above viewpoints, a compression dynamic programming method has been proposed, which is directed to reducing the number of lattice points. However, the compression dynamic programming method has a disadvantage that it is very difficult to determine which portions of the input speech pattern should be subjected to the compression process and what compression ratio should be used. Additionally, a processing for compressing the speech pattern is very complex. For example, a rate of compressed data to the original data of the speech pattern must be changed, depending on words to be identified. Further, the compression dynamic programming cannot greatly reduce the amount of calculation on the pattern matching.
An application of hidden Markov models (hereinafter simply referred to as HMM) has also been considered in use for the speech pattern matching, in which a probability of state transition is calculated. A number of lattice points at each of which a transition probability is calculated, corresponds to a product of a number of frames of an input speech pattern and a number of states defined in a model. The HMM can identify the unknown input pattern by an amount of calculation much smaller than that for the dynamic programming. In addition, the amount of calculation based on the HMM is constant without depending on different words, because the number of states is fixed. However, the HMM is a probability model, and therefore a control of the state transition in accordance with a variation of the speech as a function of time is very complex. Moreover, the reference pattern must be created from a large amount of patterns of speech to be registered, because the HMM is the probability mode.