The present invention relates to a method of recognizing voices, and more particularly relates to a method of recognizing the voices of many unspecific people.
Voice recognition methods are widely used in fields including one where a manual input means is not used at works such as the sorting of parcels. One of the methods is for many unspecific people in particular, and is required to recognize their speech with high accuracy all the time whether they are young, old, male or female. In this conventional method, the average output from a narrow-band filter bank, LPC coefficients, LPC Cepstrum coefficients or the like, which represents the short-time average spectrum envelope characteristic of a speech signal at every prescribed time interval, is used as a feature parameter for the speech signal. Along with that, a regression coefficient representing the direction of the change in the feature parameter for a number of analysis intervals is often used as a parameter representing the characteristic of the change in the spectrum of the speech signal in order to improve the performance of the recognition. To recognize the utterance of the word in the method, the correspondence which is between the feature of the speech signal entered into an apparatus for practicing the method, and one of standard patterns stored in advance for the words in the apparatus, and about which the distance between the feature and the standard pattern is the smallest of all the standard patterns is found out through dynamic programming. The word for which the correspondence is thus found out is regarded as that applied to the apparatus.
Since the voices of the many unspecific people differ from each other in quality, it is difficult to thoroughly grasp the difference between the personal vocal characteristics of the people even if a plurality of standard patterns are prepared for each word in advance. For that reason, it is likely that the distance between the feature of an speech signal for a word applied to the apparatus, and the wrong standard pattern for the word is judged to be the smallest of all the patterns, through dynamic programming, and another word corresponding to the wrong standard pattern is regarded as that applied to the apparatus. This is a problem.