This invention relates to a pattern matching apparatus for comparing patterns, such as speech patterns, that are expressed as a sequence of feature vectors.
A pattern matching method as one of the pattern recognition methods has gained a wide application. In accordance with this pattern matching method, a pattern to be recognized is registered in advance as a reference pattern and an unknown pattern is compared with the reference pattern so as to determine the pattern having the highest similarity measure as a result of recognition.
In the pattern matching method, it is of the utmost importance to cope with variation such as the variation of a speech speed in the speech pattern. A time axis normalization matching method utilizing a dynamic programming method (which will be hereinafter referred to as to the "DP method") is extremely effective as the counter-measure and hence, has been used widely. The DP method is discussed in detail, for example, in "Dynamic Programming Algorithm Optimization for Spoken Word Recognition", Hiroaki SAKOE et al, IEEE Transactions an Acoustics, Speech, and Signal Processing, Vol. ASSP-26, No. 1, February, 1978, pages 43 to 49, U.S. Pat. Nos. 3,816,722 and 4,049,913.
Assume that the sequences of feature vectors of two patterns A and B are given by the following formulas, respectively: EQU A={a(1), a(2), . . . , a(i), . . . , a(I)} EQU B={b(1), b(2), . . . , b(j) . . . , a(J)}
Then, the distance between the patterns A and B can be determined in the following manner in accordance with the conventional DP method.
As to an integration quantity g relating to the distance d(i, j) between the vectors a(i) and b(j), the following recurrence formula (1) is sequentially calculated from i=1, j=1 till i=I, j=J with an initial condition being g(1, 1)=d(1, 1): ##EQU1## The distance D between the patterns A and B is determined from g(I, J) that is finally obtained, in accordance with the following equation (2): ##EQU2## Equation (1) corresponds to the integration of d(i, j) from a point (1, 1) to a point (I, J) on a lattice of I.times.J time points under the slope constraint (slope constraint of matching path) or local constraint given by the limitation inside the braces {} in equation (1) and weighting (which is not a always necessary). The slope constraint in this case may be free within the range of 90 degrees including both horizontal and vertical directions. Accordingly, the matching path is expanded and compressed on the time axis. Since the DP method determines the distance between the two patterns by expanding and compressing non-linearly the matching path on the time axis as described above, it can normalize the variation of the speed speed in matching the two patterns of the same category.
When the two patterns to be matched belong to different categories, however, the DP method involves the problem that the similar portions of the two patterns are emphasized by non-linear expansion and compression, therefore, matching is liable to be unnatural. This unnaturalness is not a critical problem in ordinary word recognition but becomes serious where the duration time of a consonant or the transient time from a consonant to a vowel is important, such as in monosyllable recognition. This will be discussed again elsewhere with an definite example. For example, this becomes a problem when the word "keep" to be recognized is matched with the reference word "peak". In the utterance /ki:p/ and /pi:k/ of the words "keep" and "peak", the consonants /k/ and /p/ have high similarity in speech recognition processing but the duration length /k/ is longer than that of /p/.
Consider the case where the duration time of the vowel portion of the input pattern is expanded and the input pattern length is longer than the reference pattern length. DP matching matches first /k/ of the input pattern with /p/ of the reference pattern, then their vowel portions /i:/ and finally /p/ of the input pattern with /k/ of the reference pattern. In this case, even if the duration length of the consonant /k/ differs from that of /p/, the patterns are expanded or compressed by the time normalization characteristics of DP matching as described above, so that they are matched. The distance (similarity) obtained at the final time point is not much different from the distance obtained when the reference word is pronounced as "keep" and hence, recognition error occurs. In other words, the difference of the duration time length between the consonants /k/ and /p/ is neglected, although it can be used as an important feature for distinguishing them.