In many applications of pattern recognition, there is a need to match a time-varying pattern against each of a collection of stored prototype patterns. A significant problem arises because a given pattern does not necessarily reoccur at a uniform rate. For short duration patterns a simple comparison between an observed pattern and a stored prototype may be made by such well-known techniques as cross-correlation, matched filters, or minimum distance in an appropriate metric.
For longer duration patterns, it is necessary to adjust the time alignment between the individual pieces of the observed pattern and the stored prototype. For example, U.S. Pat. No. 3,700,815 to G. R. Doddington discloses a system for speaker verification by matching a sample of a person's speech with a reference version of the same text derived from prerecorded samples of the same speaker. Acceptance or rejection of the person as the claimed individual is based on the concordance of a number of acoustic parameters, for example, format frequencies, pitch period, and speech energy. The degree of match is assessed by time aligning the sample and reference utterance. Time alignment is achieved by a nonlinear process which attempts to maximize the similarity between the sample and reference through a piece-wise linear continuous transformation of the time scale. The extent of time transformation that is required to achieve maximum similarity also influences the decision to accept or reject the identity claim.
The time alignment problem can be illustrated by a simple example. Let the patterns consist of strings of letters of the alphabet. An elementary portion of a pattern is represented by a single letter. The amount of disagreement between an elementary portion of the observed pattern and an elementary portion of a stored prototype is represented by the distance between the letters as to place in the alphabet.
TABLE I ______________________________________ A Y M B P W C observed pattern D W R E Q Z H prototype (stored) 3 2 5 3 1 3 5 distance TOTAL DISTANCE: 22 ______________________________________
In the example given in Table I, there is no time alignment problem and the total "distance" between the observed pattern is easily seen to be 22.
TABLE II ______________________________________ AYMBPWC observed pattern AMBAAPGWC prototype (stored) ##STR1## Alignment 1 deletion 3 insertions SUBSTITUTION DISTANCE: 0 ______________________________________
In the example shown in Table II, there is an alignment problem with inserted and missing characters. Since there are no substitutions (changed letters) in Table II, it is easy to find the correct realignment.
TABLE III ______________________________________ AYMBPWC observed pattern DRECDQGZH prototype (stored) ##STR2## Alignment 1 deletion distance 3 insertions SUBSTITUTION DISTANCE: 18 ______________________________________
In the top half of Table III, there are substitutions as well as insertions and deletions. The correct realignment is no longer obvious at a glance. With a little analysis and searching, the alignment given in the bottom of the figure can be found. For longer, less well-behaved patterns, however, the problem can be much more difficult.
In the prior art, such alignment problems are usually tackled by a trial and error procedure. A guess is made for the alignment of each piece, then the alignment is readjusted to take into account the constraints on adjacent pieces, perhaps repeatedly. Other alignment techniques include linear, or piecewise linear, stretching or shrinking of a pattern, segmentation of the pattern into blocks and block matching, and various ad hoc procedures based on peculiarities of individual patterns. All of these techniques greatly increase in complexity and decrease in accuracy as the patterns get longer and/or more complex.
The alignment problem in fact has a general, optimal solution. As explained in "Optimal Stochastic Modeling as a Basis for Speech Understanding Systems", by J. K. Baker in Invited Papers of the IEEE Symposium on Speech Recognition, Apr. 15-19, 1974, Academic Press 1975, the well known technique of dynamic programming may be applied to search the space of all possible realignments to find the alignment which gives the best match. The term "best" as used here and hereinafter means the most probable or the one with the highest correlation score. The fundamental formula of this dynamic programming procedure is given in equation (1). EQU .gamma.(j,t)=max .gamma.(i,t-1) a(i,j) b[i,j,p(t)] (1)
where .gamma.(j,t) is a score for the partial match of position j in the prototype and position t in the observed pattern. The term a(i,j) is the probability of going from position i to position j in the prototype for a single position step in the observed pattern. If i=j, there is a deletion; if i&gt;j+1, there is an insertion. The term b[i,j,p(t)] is the conditional probability of observed p(t) in the t position of the observed pattern when going from position i to position j in the prototype.
It is an object of the present invention to provide a speech recognition system which gives the optimal time alignment of the observed speech pattern and the stored prototypes.
It is another object to provide a pattern recognition system which matches a relatively long duration, time-varying input pattern against stored prototypes with optimal time alignment.
It is another object to provide a speech recognition system which employs dynamic programming for optimal time alignment of the observed speech pattern and the stored prototypes.