1. Field of the Invention
The present invention relates to a method of and a device for learning reference pattern vectors for speech recognition for an effective improvement of the recognition performance for a speech recognition device.
2. Description of the Prior Art
The recent development in the techniques for pattern recognition such as character recognition and speech recognition has been remarkable. In the field related to speech, too, devices for recognizing spoken word and the like are being put into practical use. However, majority of the speech recognition devices are constructed in such a way as to warp the input speech pattern along the time axis by means of the dynamic programming method (DP matching method). These devices recognize the input speech pattern, by matching the input speech pattern that is normalized through the warping in the time axis with the reference patterns (standard pattern) that has been prepared in advance.
However, the prior art speech recognition device with the above construction has a weakness that its recognition capability, that is, the recognition rate is reduced by undergoing various kinds of deformation in the speech pattern under the influence of the level shift in the input speech pattern, variations in the utterance speed, variations due to the speaker, variations introduced by the public telephone line, variations due to the pitch of the speech, variations due to the background noise, and the like. In particular the decreasing tendency in the recognition performance accuracy, such as the decreasing tendency in recognition performance accuracy as mentioned in the above will reveal itself more conspicuously in the telephone word speech recognition device which is aimed at an unspecified majority of speakers or in the word speech recognition device with numerous categories of recognition objects, and further, in the recognition device for phonemes and syllables, remaining as problem to be solved in the speech recognition techniques.
In the meantime, it has been well known from the standpoint of the statistical pattern recognition theory that there is a method for improving the recognition capability (recognition performance accuracy) by carrying out the learning of the reference pattern vectors for speech recognition based on the use of a large number of speech patterns that have been collected beforehand. In the above learning method, the larger the number of collected speech patterns is, the higher the recognition score is, due to the corresponding improvement in the capabilities of the reference pattern vectors for speech recognition. However, for a speech recognition device with large number of categories of objects to be recognized or for a word speech recognition device in which practically there are required frequent changes of vocabulary it becomes necessary, in order to improve the recognition performance accuracy, to collect a very large number of speech patterns, which has been difficult to accomplish in practice. In particular, in the case of a speech recognition device for an unspecified speaker, there has been a problem that a reference pattern vectors may not be sufficiently designed based only on a small number of speech patterns. Moreover, in the case of a speech recognition device for a specified speaker or a speech recognition device of the speaker-adapted type, the inputting of the speech pattern is made by uttering of the same category by an identical speaker for a large number of times in order to allow for the variations due to the speaker. This bears burden to the user but also results a significant loss in time.