1. Field of the Invention
This invention relates to a speech recognition method, and more particularly to a continuous speech recognition method for recognizing phonemes in continuous speech.
2. Description of the Prior Art
Heretofore, in order to recognize continuous speech in a phonemic unit, the following methods have been employed:
(1) As the standard patterns of phonemes, feature vectors which are composed of the typical spectral information of the corresponding phonemes or the information of "voiced"/"voiceless" are prepared for the respective phonemes, whereupon an input speech pattern is analyzed into frames at predetermined intervals (hereinbelow, called "frame intervals") so as to examine the matching of each frame with the standard patterns.
The phoneme corresponding to the standard pattern which is matched best as the result of the examination is presumed to exist at that point of time. Matching results at the preceding and succeeding points of time are also taken into account, whereupon the input speech pattern is finally decided.
(2) The standard patterns of phonemes are expressed as time series of the feature vectors, thereby to introduce the time structures of the respective phonemes into the standard patterns. Thus, an input speech pattern is analyzed into frames at predetermined intervals, the processing (termed "segmentation") of regarding phonemes of similar characters as one phonemic section collectively is carried out, and each segment is examined on the matching with the standard patterns in which the time structures are introduced.
The phonemic section corresponding to the time series of the standard pattern which is matched best as the result of the examination is decided to be the input speech pattern.
The method (1), however, has the disadvantage that information on a time structure which the input speech pattern possesses cannot be fully exploited. On the other hand, the method (2) is ameliorated in this respect, but it has the disadvantage that the execution of the segmentation at high precision is difficult. It has consequently been impossible to attain a satisfactory recognition rate with either the method (1) or (2).
As a method for eliminating the disadvantages of the methods (1) and (2) and utilizing the merits thereof, there has been proposed a method in which standard patterns having time structures are prepared in advance as in the method (2), and using the known continuous DP (Dynamic Programming) matching method (refer to the official gazette of Japanese Patent Application Publication No. 55-2205) and while an input speech pattern is kept matching continuously without executing the segmentation, the matching between a part of the input speech pattern and each of the prepared standard patterns is examined. (refer to Japanese Patent Application No. 54-91283)
In this case, the restriction on nonlinearity of time structure of speech is loose. Therefore, even a part which shows good similarity is sometimes processed so as to beyond the reasonable variation range of a time axis. This has led to the problem that the misrecognition rate cannot be made sufficiently small.