(1) Field of The Invention
The present invention relates to an isolated word recognizer (also referred to as a speech recognizer) for determining similarities between a pattern of an input speech signal and a plurality of reference patterns, and for outputting one of the reference patterns which has the nearest similarity as a result of recognition.
In recent speech recognizers, speech recognition processing techniques have been established and the utility thereof has been realized in various fields. As a result, the circuits necessary for the speech recognition processing have been made as large scale integrated circuits (LSI). In various types of such speech recognizers using LSIs, a development of an apparatus used as a man-machine interface (MMI) having an input of human speech has especially been actively developed. As examples of man-machine interface, there are telephone number input, or command input when program is to be prepared. To use the speech recognizer as a man-machine interface, the speech recognizer must satisfy the requirements of small size, low cost, high performance, and the like. The speech signals, however, fluctuate even when they are comprised of the same words. The fluctuations are caused by the difference of the voice when speakers are different, the feeling of the speaker, the circumstances of the speaker and the like. Therefore, the speech signals input to the speech signal recognizer are not always constant even when the speech signals are for the same word. In such a case, it is necessary to avoid errors in recognition.
(2) Description of the Related Art
In a conventional speech signal recognition processing sequence, features are extracted from the input speech signal, a pattern based on the extracted features is compared with a plurality of reference patterns, and a reference pattern which is most similar to the pattern based on the input speech signal is output as a recognition result. The shorter the distance between an input pattern and a reference pattern is, the higher the degree of similarity therebetween. The input feature extraction is described in the prior application Japanese Patent Application No. 62-33852 filed on Dec. 24, 1987. The corresponding U.S. patent application is Ser. No. 287,284, filed on Dec. 21, 1988.
In the above-described conventional speech recognizer, a weight coefficient 1 is multiplied by each difference between the feature parameters of an input pattern and its reference patterns. Therefore, all of the differences of the feature parameters are weighted by the same weight coefficient 1. In other words, conventionally, the feature parameters are not weighted. In the condition in which the same weight coefficient is applied, assuming that a numeral "ichi (1)" is to be recognized. In this case, the input pattern is compared with reference patterns. The distance between the input pattern of "ichi (1)" and the reference pattern of "ichi (1)", however, is very similar to the difference between the input pattern of "ichi (1)" and the reference pattern of "hachi(8)" because the part "chi" in the reference pattern of "ichi (1)" and the reference pattern of "hachi(8)" are the same, and because there is only a difference in the part "i" of the reference pattern of "ichi (1)" and the part "ha" in the reference part "hachi(8)". Therefore, when the input speech fluctuates, it is difficult to correctly recognize the input spoken word.
From another point of view, to attain high performance of the input speech recognition, a multi-template method has been provided before the present invention. In the multi-template method, a plurality of voices are used for one word, and a plurality of feature patterns corresponding to the plurality of voices are formed and registered. Namely, for the same word, for example "ichi (1)", feature patterns of both a short pronounciation [it] and a long pronounciation [i: t] are formed and registered. By this, the possibility of error in recognition due to a fluctuation of the input speech at the time of recognition can be decreased.
This multi-template registering method, however, has a disadvantage in that a large number of calculations of distances are required between the feature patterns formed by an input speech and the feature patterns read from a dictionary at the time of recognition, because there are a plurality of feature patterns formed for one input word, so that the number of feature patterns registered in the speech dictionary is too large.
To answer with a good response time, it is necessary to calculate at high speed. If, however, the number of calculations is increased, a good response time becomes impossible, and the small size and low cost cannot be realized.