The present invention relates to a pattern matching method and apparatus useful for recognizing an input string of spoken words and, more particularly, to a speech recognizing apparatus of high recognition accuracy for recognizing at a high speed an input string of words which are continuously uttered in compliance with a regular grammar program.
A variety of improvements in the reduction of calculation steps for speech recognition based on the DP matching performing a time normalizing have been proposed in the aspect of responsiveness. For example, Cory S. Meyers and Lawrence R. Rabiner have contemplated reduction of the calculation quantity by executing the DP matching algorithm between an input pattern and a reference pattern at each digit. Reference should be made to "A Level Building Dynamic Time Warping Algorithm for Connected Word Recognition", IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. VOL. ASSP-29, No. 2, APRIL 1981, pp, 284-297. On the other hand, there has been proposed by the present inventor and another a system for eliminating the problem intrinsic to the system of Myers et al., i.e., the slow responsiveness due to the DP matching processing in the direction of the reference pattern time axis. According to this system (U.S. patent application Ser. No. 447,829, now matured into U.S. Pat. No. 4,592,086, and European Patent Publication No. 0 081 390), the responsiveness is improved by conducting the matching algorithm processing in an input pattern time axis (which is called the "clockwise processing" and will be referred to as a "first technique").
We have also proposed (in U.S. patent application Ser. No. 664,732) a system (which will be referred to as a "second technique") for drastically reducing the access time to a memory (i.e., the calculation time) by conducting the calculations for each block having a predetermined width in an input pattern time axis.
Of the speech recognizing apparatus, an apparatus for recognizing an input string of words which are uttered in compliance with the regular grammar can be used for a wide range of applications such as computer programs, limited business documents, directions for air planes and control instructions of various devices. It is a known principle that an erroneous recognition can be reduced by making use of the grammatical rules. Especially for continuous digit recognition where the number of digits of an input speech is restricted, the recognition accuracy can be improved by regulating the restrictions.
A method for recognizing an input string of spoken words in compliance with the grammar on the basis of the first technique has been proposed by Sakoe (U.S. patent application Ser. No. 448,088, now matured into U.S. Pat. No. 4,555,796, and European Patent Publication No. 0 082 000). On the basis of the second technique, the speech of that type can be recognized, but the following problem arises. If the second technique is used, the grammar is expressed by an automation .alpha., which is defined by: EQU .alpha.=&lt;K, .SIGMA., .DELTA., P.sub.0, F&gt; (1)
where
K: a set of states p{p.vertline.p=1, 2, . . . , .pi.}; PA1 .epsilon.: a set of reference words n{n.vertline.n=1, 2, . . . , N}; PA1 .DELTA.: a state transition table {(p, q, n)}, in which a combination (p, q, n) represents a state transition of ##EQU1## P.sub.0 : an initial state, and later p=0; and F: a set of final states F.epsilon.K.
A speech pattern A obtained by continuously uttering a plurality of words n.epsilon..SIGMA. according to the automaton .alpha. will be called an (unknown) input pattern and is expressed by: EQU A=a.sub.1, a.sub.2, . . . , a.sub.i, . . . , a.sub.I ( 2).
For each word n.epsilon..SIGMA., the following reference pattern is prepared and will be called a "word reference pattern": EQU B.sup.n =b.sub.1.sup.n, b.sub.2.sup.n, . . . , b.sub.j.sup.n, . . . , b.sub.J.sup.n n (3).
A string of speech reference patterns C=B.sup.n1, B.sup.n2, . . . , B.sup.nx, which is obtained by connecting word reference pattern B.sup.n in compliance with the automaton .alpha., is subjected to the DP matching with the input pattern A to calculate a quantity (which will be called a "dissimilarity" and is expressed by g(m, j)) representing the difference between those two patterns, and a word string giving the minimum dissimilarity is accepted as the result of recognition. Here, the dissimilarity g(m, j) is an accumulated distance at the time m of the input pattern to the time j of the reference pattern.
Now, the minimum dissimilarity is determined by the following dynamic programming technique: The initial condition is set as: ##EQU2## and asymptotic equations (7) are calculated for i=1to I/IL (wherein I/IL is assumed to be integrally divisible for simplicity of discussion) and for all the pairs (p, n) of (p, q, n).epsilon..DELTA. consecutively on the basis of boundary conditions of equations (5) and (6). Here T(m, q) represents either the minimum accumulated distance at the end time (or frame) J.sup.n of all the reference pattern words n that reach the state q at the time m of the input pattern or the initial value at the time m in the state q; and G(p, n, j) represents the accumulated distance at the time j of the reference pattern on the boundary between a one-preceding block and the present block in the procedure for determining the accumulated distance of the reference pattern words n started from the state p. Specifically the boundary conditions are given by: ##EQU3## and the asymptotic equations are given by: ##EQU4## The asymptotic equations (7) are calculated from the time m=m.sub.s to m.sub.e and then from the time j=1 to J.sup.n. Here h(m, j) represents the path value (or pointer at (m, j) from which best path to (m, j) came) at the time m of the input pattern and at the time j of the reference pattern; and H(p, n, j) represents the path value of the reference pattern at the time j on the boundary between the one-preceding block and the present block in the procedure for determining the accumulated distance of the reference pattern word n started from the state p.
After the calculation of one block is completed, the values g(m.sub.e, j) and h(m.sub.e, j) are stored in the table memories G(p, n, j) and H(p, n, j), respectively. Here, ##EQU5## y means the value x of x.epsilon.X, which minimizes the value y under the condition x.epsilon.X.
Next, the minimization of the word at the boundary, the following calculations are conducted: ##EQU6## Here, the values N(m, q), P(m, q) and L(m, q) represent; the word number n which gives the minimum accumulated distance for the reference pattern word n having reached the state q at the input pattern time m; the state p in which the word n having given the minimum is started; and the word path value h(m, J.sup.n) (i.e., the time of the input pattern corresponding to the input pattern time of the start time of the word n) of the reference pattern having given the minimum. In other words, the asymptotic equations (7) are calculated for each pair (p, n) of the IL frame of the input pattern, and these calculations of one column block are performed along the input pattern axis until the end m=I.
The recognition result of the input pattern is obtained in a decision process as follows for the recognition of word number n, the start time l of the recognition word n, and the state q in which the recognition word n is started: ##EQU7##
If l&gt;0, the equations (11) are repeated for q=q and m=l. If l=0, the calculations are ended.
Thus, the calculations of the asymptotic equations of one column block may be repeated IL/I times. As compared with the method of repeating the calculations of the asymptotic equations of one column block I times in accordance with the first technique, the number of read and write times of the table memories G(p, n, j) and H(p, n, j) is reduced to 1/IL so that the memory access time can be shortened.
In case loops are included in the state transition of the automaton, however, the correct result cannot be attained because of an inconsistency that the result T(m, p)=g(m, J.sup.n), m=m.sub.s, . . . , m.sub.e -1 at the word end of the asymptotic calculations of one column block in the state p is used as the initial value g(m-1, 0)=T(m-1, p), m=m.sub.s +1, . . . , m.sub.e of the asymptotic calculations in the state p of itself. There is also disclosed in the second technique a method of calculating the boundary conditions, i.e., the table memories T(m, q), N(m, q), P(m, q) and L(m, q) only at the point of the time m=i.multidot.IL=m.sub.e. This method is advantageous in that the loops may be contained in the state transition of the automaton. According to this method, however, the precision for determining the boundary between the words is worse, thus causing erroneous recognitions because the calculations of the boundary conditions are conducted at coarse intervals.
As has been described hereinbefore, the methods according to the prior art have a defect that the loops cannot be contained in the automaton state transition or that the loops contained deteriorate the time precision for the word boundary determination so that the number of erroneous recognitions are increased.