1. Field of the Invention
This invention relates to a method of speech recognition.
2. Description of the Prior Art
Japanese published unexamined patent application 61-188599 relates to a prior art method of the recognition of speech uttered by an unspecified speaker.
According to the prior art method of speech recognition which is disclosed in Japanese application 61-188599, the start point and the end point of input speech are detected, and thereby the interval of the input speech is decided. The input speech signal is subjected to a time base adjusting process while the interval of the input speech is expanded or contracted to a fixed time length corresponding to I frames. Here, the letter I denotes a given natural number. The similarities between the resultant input speech and standard patterns of recognition-object words are calculated by a pattern matching process using a statistical measure. One of the recognition-object words which corresponds to the highest similarity is selected as a recognition result.
The standard patterns of the recognition-object words in Japanese application 61-188599 are prepared as follows. First, recognition-object words are uttered by many different speakers to collect speech samples. The speech samples are expanded or contracted to a fixed time length corresponding to I frames. For each of the recognition-object words, statistical quantities (a mean value vector and a covariance matrix) between the resultant fixed-length speech samples are calculated, and the statistical quantities are processed into a related standard pattern. Thus, the time lengths of all the standard patterns are equal to the fixed time length corresponding to I frames. In general, one standard pattern is prepared for one recognition-object word.
Japanese published unexamined patent application 62-111293 discloses a prior art method of speech recognition which is improved over the prior art method in Japanese application 61-188599. While the prior art method in Japanese application 61-188599 needs a step of detecting the interval of input speech, the prior art method in Japanese application 62-111293 dispenses with such an interval detecting step and uses a word spotting technique.
According to the prior art method of speech recognition which is disclosed in Japanese application 62-111293, an input signal interval is set equal to a sufficiently long period during which speech to be recognized, noise preceding the speech, and noise following the speech occur. A temporal reference point is provided in the input signal interval. Proposed speech intervals are provided which start from the reference point and which are sequentially offset by 1-frame lengths. The shortest proposed speech interval has N.sub.1 frames, where N.sub.1 denotes a natural number. The longest proposed speech interval has N.sub.2 frames, where N.sub.2 denotes a natural number. The total number of the proposed speech intervals is equal to N.sub.2 -N.sub.1 +1. The input signals in the proposed speech intervals are collated with standard patterns of recognition-object words while the proposed speech intervals are expanded and contracted to a fixed time length. This collation provides the similarities or distances related to the respective recognition-object words. Such collation is reiterated while the reference point is moved from the start point to the end point of the input signal interval. Consequently, the similarities related to the respective recognition-object words are determined for all the proposed speech intervals and all the different reference points through a pattern matching process. The recognition-object word related to the maximum of the similarities is outputted as a recognition result. In the prior art method of Japanese application 62-111293, to realize the word spotting technique, the pattern matching process for the calculation of the similarities uses a statistical distance measure based on a posteriori probability.
The prior art method in Japanese application 61-188599 and the prior art method in Japanese application 62-111293 tend to be low in recognition accuracy when there are many recognition-object words. The low recognition accuracies in the prior art methods in Japanese application 61-188599 and Japanese application 62-111293 are caused by the following factors.
(1) In general, a time length varies from word to word, and the time length of a word provides some information for discriminating between words. The prior art methods do not use such information since the lengths of all recognition-object words (the time lengths of all standard patterns) are set in common to a fixed length corresponding to I frames.
(2) In the prior art methods, the input speech interval is expanded or contracted to the I-frame period, and there occur repeated or overlapped frames and omitted frames. The repeated or overlapped frames cause redundant calculation. The omitted frames cause some information to be missed out. Both of the cases omit important information representing temporal motion between adjacent frames.