The present invention relates to an improvement in pattern recognition using a dynamic programming method.
In general, even when the same person says the same word, the word length changes each time the word is pronounced, with the length increasing or decreasing non-linearly on the time axis. Specifically, the word length contains an irregular amount of allowable distortion with respect to the time axis in the word length. As a result, it is necessary for the time axis to expand and contract in, for example, voice recognition for the same phonemes to correspond in the standard pattern and the characteristic pattern of the input voice. This can be accomplished by using a method called dynamic programming (DP). DP matching is a method whereby DP is used for time expansion matching of the characteristic pattern and the standard pattern, and is an important technique used in voice recognition.
In recent years the inventor and others have proposed (Nakagawa, kamiya, Sakai: Recognizing voiced single words of a non-specific speaker based on simultaneous non-linear expansion of time axis, frequency axis, and intensity axis in the voice spectrum, The Transactions of the Institute of Electronics and Communication Engineers of Japan, '81/2 Vol. J64-D No. 2) a speaker adaptation method which applies DP matching to cope with characteristic pattern variations in the voice signal resulting from individual differences, and through testing have confirmed the effectiveness of this method.
The above speaker adaptation method focuses on the fact that characteristic pattern variations resulting from individual differences appear as primarily irregular allowable distortion on the frequency axis, and uses dynamic programming for frequency expansion matching. Specifically, the single vowel /a/ is pronounced as a keyword, and the spectrum in the steady portion of this vowel /a/ is compared with the spectrum in the steady portion of the same vowel /a/ of the standard speaker by means of dynamic programming matching on the frequency axis. The direction of the shift on the frequency axis of the vowel /a/ spectra between the input and standard speakers is then detected, and this detected direction of shift on the frequency axis of the vowel /a/ spectra between the input and standard speakers is used for speaker adaptation in actual word recognition.
However, when it is attempted to normalize the degree of shift in addition to the direction of shift on the frequency axis of the single vowel /a/ spectrum in the above speaker adaptation method, phoneme differences are normalized as well as individual differences, resulting in the problem of cases in which word recognition may not be possible even though individual differences are removed.