The present invention relates to a method and an apparatus for label-sequence DP (dynamic programming) matching usable in the fields of character recognition, speech recognition, spelling checking, data base queries for synonyms or the like, and more particularly to a method and an apparatus for DP matching in case there are multiple label-sequences to be compared.
In the fields of speech recognition or character recognition, one recognition method in general use consists of extracting features from inputted information, representing the features as an ordered label sequence, comparing the extracted label sequence with reference label sequences previously stored in a recognition dictionary (referred to as templates hereafter), and outputting the most similar (minimum distance) template as the recognition result. In cases of spelling checking, data base queries for synonyms, or the like, a character is regarded as one label and the label sequence (namely a single word) having a minimum distance to an inputted sequence is selected from a word dictionary or a data base, so that the approximate matching of the label sequence is required in the same manner as set forth above.
In this case, since the correspondence between the respective labels is not generally known (except that a relative position of each label is not reversible), it is required to define a distance by the solution having the minimum value from all possible mappings. One of the methods for calculating this distance at high speed is dynamic programming (DP). DP has many modified forms and is most typically formulated as follows.
Two label sequences to be compared are assumed to be A=a.sub.1, a.sub.2, . . . a.sub.n, B=b.sub.1, b.sub.2, . . . b.sub.m. If the label mapping or joining function is expressed by c.sub.p (k) (k=1, . . . , K), where c.sub.p (k) is the mapping or joining function for a pair of labels (a.sub.ik, b.sub.jk, the distance D between A and B can be defined as ##EQU1##
In equation (1), d(c.sub.p (k)) indicates the distance between a.sub.ik and b.sub.jk. w(k) indicates the distance along the path of the mapping function.
There are various types of w(k). For instance, when DP is applied by regarding w(k) as w(k)=(i.sub.k -i.sub.k-1)+(j.sub.k -j.sub.k-1), D(A, B) can be calculated as g(n, m)/(n+m) by the following recurrence formula g(i, j). ##EQU2##
In order to obtain g(n, m), g(i, j) is calculated for all lattice points (m.times.n points) satisfying (1.ltoreq.i.ltoreq.n, 1.ltoreq.j.ltoreq.m) on the plane of (i, j), as shown in FIG. 3. Generally, the calculation of the formula (2) is executed with either of i or j fixed and the other incremented. When the calculation is completed, the fixed one is increased by 1 and a similar calculation is repeated. A respective calculation executed with only one variable increased is called one stage calculation, and the calculation itself will be called a stage calculation hereafter. In this case, either variable may be fixed, however, in the explanation hereafter, the i axis will be regarded as the input label sequence, and the j axis will be regarded the reference label sequence to be compared in a dictionary. Thus, the stage calculation will be executed with j fixed (FIG. 3).
In this DP method, the distance can be obtained by performing at most (m.times.n) steps. Therefore, the amount of calculation can be extremely reduced. Since the number of templates to be compared or the words to be retrieved in the dictionary is, however, generally remarkably large, and the distance between each of them and the input label sequence is calculated, the problem of calculation speed still remains.
In order to overcome this problem, methods roughly classifiable into two groups have been developed. These methods and their problems are described below.
(a) Beam Search
As is apparent from the foregoing description, the calculation of m stages is required in order to carry out the DP calculation between the input label sequence and one label sequence in the dictionary. Alternatively, in a beam search, the calculation for a given stage excludes those templates found to be unlikely minimum distance candidates. Consequently, the number of calculations for the recurrence formula is reduced and the calculation speed is improved because the calculation of a given stage is carried out only for remaining templates. The unlikely minimum distance candidates may be the N highest templates. Alternatively, the stage calculation may be aborted when g(i, j) exceeds a threshold function of t(j)=a.multidot.j+b (a, b are constant coefficients) since g(i, j) is the amount of accumulation for j. In either case, however, a low probability is a reason for abortion of the stage calculation, so that the minimum value of the distance obtained as a result is only an approximate solution and not necessarily optimum.
It is understood that when the threshold for aborting the stage calculation is high, the obtained solution becomes optimum. However, it is contrary to the aim of the beam search, and a significant improvement in speed cannot be expected.
With reference to FIG. 4, an example of the beam search will be explained. A threshold function is t=(1/5)j+2. Distance functions between the labels are ##EQU3## and the equation (2) is used as the recurrence formula. As for DP of an input beopqr and a template aeodef, since the minimum value (3) in one stage exceeds a threshold (2.8) in the fourth stage, the DP calculation is aborted at this time. The DP calculation between the input beopqr and a template yyzww is similarly aborted at the second stage.
The optimum template may possibly be one of the templates for which the DP calculation is aborted, if the optimum template is not matched well with the beginning part of the input label sequence. This is the principal disadvantage of the beam search.
This method is disclosed in the study by H. Ney et al. entitled "A Data Driven Organization of the Dynamic Programming Beam Search for Continuous Speech Recognition", (Proc. ICASSP 87, pages 833-836, 1987).
(b) Stack DP
When there are two templates, in case the results of the j.sub.p and j.sub.q stages are the same and the j.sub.p+1 and the j.sub.q+1 labels are the same, the results of the stages for these labels are the same because of the characteristic of the recurrence formula (2). This indicates that when multiple templates share a common label sequence, it is not necessary to calculate the corresponding stage for each of the multiple templates. Thus, a method has been proposed in which the label sequences shared by the templates are grouped together, the template group is expressed by an acyclic planar graph, and the recurrence formula calculation for the shared part is executed once. This is stack DP (called by this name since a stack is used for the recurrence formula calculation).
Examples of stack DP are shown in FIGS. 5a-5c. FIG. 5a shows an example representing the templates by one graph and each node corresponds to one label. The number of branches is arbitrary and in this example, only two branches are considered for simplicity. FIG. 5b shows a stack used for the calculation and has a capacity (width) capable of storing the calculation result of one stage. FIG. 5c shows the states of the stack in steps of calculation. Parts shown with slanting lines indicate stack levels in which the stage calculation is carried out.
In stack DP, the recurrence formula calculation is performed using the stack for the respective stages sequentially along the direction of the graph (A to B). When the graph reaches to a branch point, two new stacks are used. The result of an immediately preceding stack is copied (B), and then, the respective stage calculations are carried out corresponding to the label sequences of the respective branches (B to C). Then, when a connected point is reached, the calculation results of the two stacks are compared and the smaller one is written in the original stack (used for the stage calculation before the branch)(C). The recurrence formula calculation is continued (C to D) until the graph is branched again or reaches a termination, using the written stack and its value. Each time the graph is branched or connected, similar operations and calculations are performed. When the graph reaches its termination, assuming that the values of the stage stored in the stack of a first level are G(i) (i=1, . . . n), G(n) indicates the distance from the optimum template (the minimum distance).
A detailed explanation of this method is disclosed in the study by H. Sakoe entitled "A Generalization of Dynamic Programming Based Pattern Matching Algorithm Stack DP-Matching" (Transactions of the Committee on Speech Research, The Acoustic Society of Japan, S83-23, 1983).
According to this method, it is assured that an optimum solution is always obtained. However, since this method presupposes that the DP calculation is basically performed for every part of the graph, the improvement in the speed of calculation is based on the omission of the calculation in the common subsequence. Moreover, in order to form the acyclic planar graph having the shared parts from the respective templates, an operation for obtaining the shared partial label sequences between each pair of the templates and sequentially arranging them from the longer partial sequence without contradiction is required. Therefore, a high calculation cost is incurred for a large number of templates. Further, it is difficult to add or delete a template.