The present invention relates to provide an improved recognition apparatus for a continuous speech or the like by pattern matching using a dynamic programming method (to be hereinafter called DP matching). Hereinafter, recognition of word speech pronounced with pauses to require provision of a pause at a predetermined length or more between the word speeches to be input is called the isolating word speech recognition, while that not required to do so is called the continuous word speech recognition.
Conventionally, the recognition method for the continuous speech by the pattern matching using the dynamic programming method includes, as well-known, a two-level DP matching method (2 level DP method), a level building method (LB method), a clockwise dynamic programming method (CWDP method), and an order n dynamic programming method (O(n) DP method). These methods all register individual patterns corresponding to words to be recognized and couple these patterns most suitably to thereby obtain the combination of said reference registered individual patterns closest to the patterns corresponding to continuous-pronounced input speech of the words, so that the row of the registered patterns of words corresponding to the combined patterns is made as the recognition result. Some methods have been proposed which obtain best combination of the reference pattern regarding (a) the number of the input words is not known (when the information as to the input word number is not used), (b) the same is known (when the information as to the input word number is used), (c) the order of appearance of input words can be represented by automation or the like (when restriction as to the appearance order of the input words is utilized), etc.
One conventional method, however, is surely less in the calculation amount but applicable to the above case (a) only and not applicable to those (b) and (c). Or, another conventional method is applicable to the cases (a) through (c) but [largely] requires large calculation amount or memory. The present invention has been designed to eliminate the above problem, so that the apparatus of the invention is applicable for all the cases (a) through (c) and equivalent to the smallest calculation amount of the conventional example and has a necessity of memory storing amount at the middle between the minimum and the maximum in the conventional example.
In order to understand the present invention, it is necessary to understand what is the pattern matching, how to apply thereto the dynamic programming method, and what condition is required to apply this method. Hence, at first isolated word speech recognition using the DP matching will be described and then the continuous word speech recognition will be described as follows:
The speech recognition apparatus by the pattern matching generally comprises; a feature extracting means for converting input speech signals into a series of feature vectors (input pattern) by a filter bank, Fourier analysis, LPC analysis or the like; a reference pattern memory means for previously registering as the reference pattern regarding all the recognizing words the series of feature vectors (called the reference pattern) extracted by the same means as the feature extracting means from individual word speech as the recognition vocabulary prepronounced; pattern comparing means for computing a similarity or a distance between the input pattern pronounced to be recognized and extracted by the feature extracting means and the reference pattern stored in the reference pattern memory means; and judging means for delivering as the result of recognition the word corresponding to the reference pattern of the highest similarity (of the smallest distance) as a result of the pattern comparison.
In the aforesaid apparatus construction, it is problematical for the pattern matching how to compare the patterns generally different in length (the vector sequence row) and how to define a distance between both the patterns (to be hereinafter described as the distance).
Next, one of solutions for the above will be shown as follows: When the ith feature vector constituting the input pattern T is represented by a.sub.i, the number of the feature vectors for T by I, the nth reference pattern is represented by R.sup.n, the jth feature vector constituting R.sup.n by b.sub.j.sup.n, the total number of the feature vectors constituting R.sup.n by J.sup.n, EQU T=a.sub.1 a.sub.2 . . . a.sub.i. . .a.sub.I (1) EQU R.sup.n =b.sub.1.sup.n b.sub.2.sup.n . . . b.sub.j.sup.n . . . b.sub.J.sup.n n (2)
are put, and a distance between both the patterns is represented by D(T, R.sup.n) the following formula is defined: ##EQU1## where c(k) is a vector meeting a relation c(k)=(i(k), j(k)) with respect to k=1, 2 . . . K and a function relating the feature vector a.sub.i(k) of pattern T to the feature vector b.sub.j.sup.n.sub.(k) of pattern R.sup.n. Accordingly, assuming that a.sub.1 surely corresponds to b.sub.1.sup.n and a.sub.I to b.sub.j.sup.n n, i(K)=I and j(K)=J.sup.n are obtained. d.sup.n (c(k))=d.sup.n (i(k), j(k)) represents the distance between the feature vector a.sub.i(k) of pattern T and the feature vector b.sub.j.sup.n.sub.(k) of pattern R.sup.n. The definition of the distance between the vectors being variously proposed, it is most simple to use the city block distance. According to the city block distance, when EQU a.sub.i =(a.sub.i1, a.sub.i2, . . . a.sub.ip) (4) EQU b.sub.j.sup.n =(b.sub.j1.sup.n, b.sub.j2.sup.n, . . . b.sub.jp.sup.n)(5)
are expressed (p is the dimension for each vector), the distance between the vector a.sub.i and the vector b.sub.j.sup.n is defined as follows: ##EQU2## w(k) is a weighting coefficient to be variously considered, but is decided to solve the formula (3) by the dynamic programming method.
The formula (3) means that by making the correspondence between the feature vector a.sub.i (i=1, 2 . . . I) of pattern T and the feature vector b.sub.j.sup.n (j=1, 2 . . . 1) of pattern R.sup.n to be optimum, thereby a minimum value relative to the correspondence of the weighted average of the distance between the vectors being corresponded to each other is obtained, and the minimum value is represented as the distance between the pattern T and the pattern R.sup.n.
FIG. 1 is a lattice graph to illustrate the above matter, in which the axis of abscissa represents the coordinates corresponding to the respective vectors in the series of feature vectors of the input pattern T and the axis of ordinate represents coordinates corresponding to the respective vectors of the series of feature vectors of the reference pattern R.sup.n. The correspondence of the vectors can be shown by the lattice point on the graph. Numeral 1 designates a line connected the lattice points in a manner of time series. Hereinafter, the line is called a "path" to provide the correspondence of the feature vectors of pattern T and pattern R.sup.n.
When the distance between the patterns is defined as foregoing, it is problematical how to solve the formula (3). Referring to FIG. 1, it is the problem how to find an optimum path giving the minimum value of the weighted average, which is solved of course theoretically by computing all the paths from the lattice point (1, 1) to that (I, J.sup.n), but such massive computation is required therefor as to be not realizable. Such method, however, is similar to the shortest path problem in the dynamic programming method so as to be expected to effectively solve the problem by applying said method, which is called the DP matching.
From the theory of dynamic programming it is necessary for applying that the principle of optimality holds. That is, referring to FIG. 1, assuming that the optimum path 1 from the lattice point (1,1) to that (i, J.sup.n) is found, the optimum path from the point (1,1) to that (i, J) with respect to an optional point (i, J) is identical with a path from the point (1,1) to that (i, J) on the path 1. If this can be said, the optimum path from the lattice point (1, 1) to that P.sub.0 is obtained in a manner that the point possible to be taken one before the point P.sub.0 is represented by P.sub.1 . . . P.sub.u the respective optimum path from (1,1) to P.sub.1 . . . P.sub.u, and the sum of weight of the distance between the vectors along the optimum path (hereinafter, the sum of weight of the distance between the vectors along a path from a certain point to a certain point is referred to as "the cumulative distance", a path to give a minimum value thereof is referred to as "the optimum path", and the cumulative distance therealong is referred to as "the minimum cumulative distance") is obtained, the minimum cumulative distance corresponding to p.sub.u is represented by G.sub.u, and the weighting coefficient along the path from p.sub.u to p.sub.0 is represented by w.sub.u0, the minimum cumulative distance G.sub.0 to p.sub.0 is obtained in the following formula: ##EQU3## In other words, in order to obtain G.sub.0, there is no need to compute the cumulative distance along the path regarding all round paths from (1,1) to p.sub.0, but each already computed cumulative distance from (1,1) to P.sub.1 . . . P.sub.u is usable and subsequently G.sub.0 is obtained. Accordingly, stepwise such computation from (1,1) to (I, J.sup.n) results in D (T, R.sup.n). Also, apparently, all the minimum cumulative distances having been calculated are not required to be stored, but the distance to be next.
The next problem is whether the formula (3) satisfies the principle of optimality. If not satisfied, what is required for the condition to satisfy it. In conclusion, the above formula generally does not satisfy it. Next, the reason for the above and a condition to allow the principle of optimality to hold will be obtained.
In the aforesaid example, assuming that the point p.sub.u is selected as the point before p.sub.0 when the sum of weighting coefficient along the optimum path L.sub.u from (1, 1) to p.sub.u is represented by w.sub.u, minimum cumulative distance by G.sub.u, the sum of weighting coefficient along other optional paths L.sub.u ' from (1, 1) to p.sub.u by w.sub.u ' and the cumulative distance by G.sub.u ', apparently from the assumption, EQU Gu/Wu&lt;G.sub.u '/W.sub.u ' (8)
is expressed. In this case, in order that the principle of optimality holds, when the formula (8) holds, the following inequality should hold: ##EQU4## where W.sub.u0 is the weighting coefficient along the path from p.sub.u to p.sub.0. However, it is clarified by actually putting numerals into the above that it generally does not hold. The inequality (9) holds only when W.sub.u =W.sub.u ', which means that in order that the formula (3) is solved by the dynamic programming, it is necessary that the sum of weighting coefficient along the optimum path connecting two points is constant regardless of the paths in the corresponding lattice graph.
In a case that the pattern matching is actually solved by the dynamic programming method, various restraining conditions, other than the above-mentioned conditions, are usually set in the selective path or a range thereof from the property of the speech signal. FIG. 2a shows a restraining condition for the path selection as one example for the restraining conditions, which means that paths to the point (i, j) should inevitably adopt a path 2 from the point (i-2, j-1) through (i-1, j), a path 3 coming from the point (i-1, j-1), or a path 4 from the point (i-1, j-2) through (i, j-1). At this time, a maximum inclination of the selective path is 2 and a minimum is 1/2. Assuming that the initial ends and the last ends of input pattern and reference pattern are allowed to inevitably correspond to each other, as shown in FIG. 1, the path from the point (1 1) to that (I, J.sup.n) is limited in a hatched portion. The reason for the restriction is to avoid too extreme correspondence to occur, according to the fact that the time axis, even though it expands and contracts when the length of the input pattern is changed at every speech, must not do so extremely with respect to the same word.
Letters a to e in FIG. 2a show the weighting coefficient when the respective paths are selected, which weighting coefficient may optionally be decided only when the principle of optimality is satisfied, but is usually decided as follows: