1. Field of the Invention
The present invention relates to a method and a system for automatically recognizing a pattern expressed in a time sequence of a feature vector, such as a voice signal.
2. Description of Related Art
Various technologies have been developed in pattern recognition systems for recognizing a time sequence pattern. Among the best established and frequently used methods is "Hidden Markov Models (HMM)". The principle of the HMM will be discussed hereinafter.
Assuming a word name is designated by a number w, the object for recognition can be a word set which can be expressed by: EQU {w.vertline.w=1, 2, . . . w} (1)
The reference pattern of each word can be expressed by a sequence of states. The nth state of the word w has a vector output probability distribution of Multi-Dimensional Gauss distribution b.sub.n.sup.w (x) which is determined by the Set of Average Vector .mu..sub.n.sup.w and the covariance matrix ##EQU1## where P is a dimension of vector x and
.mu..sub.n.sup.w is the Average Vector
(.SIGMA..sub.n.sup.w).sup.-1 is the inverse matrix of covariance matrix of P rows and P columns
and
the superscript .dagger. represents transposition.
To transition at respective states, transition probabilities are associated. An example (in the case that the number of states is N.sub.w) of a reference pattern of a word is shown in FIG. 6. In FIG. 6, the nth state has a Transition probability of a.sup.w.sub.n.n to itself and to an adjacent n+1th state having Transition probability=a.sub.n.n+1.sup.w.
The word output probability (a.sub.1, . . . a.sub.T, .linevert split.w) of the reference pattern of the word w to output the feature vector sequence a.sub.1, . . . a.sub.T is defined by the following equation: ##EQU2## Here, n.sub.1, . . . , n.sub.T represents the transition of state which stays at n.sub.1 at a time t=1, and reaches n.sub.T at a time t=T. In the foregoing equations, .SIGMA. represents sums of all possible state transitions. The state transitions may be discussed on the trellis shown in FIG. 7. In FIG. 7, the horizontal axis corresponds to the feature vector and the vertical axis corresponds to the state sequence (see FIG. 6) of the reference pattern of the word. The state transitions are designated on the path (thick line) on the trellis and n.sub.1, . . . , n.sub.T express this in the above equation. P(a.sub.1, . . . , a.sub.T .linevert split.n.sub.1, . . . , n.sub.T, w) in equation (4) is the probability when a state transition is established, and P(n.sub.1, . . . , n.sub.T .linevert split.w) is the probability of occurrence of the state transition. These probabilities are calculated from a vector output probability distribution and a transition probability as follows: ##EQU3##
The effective method of calculation of the word output probability P(a.sub.1, . . . , a.sub.T .linevert split.w) given by equation (4) is known as the "forward calculation method". The "forward calculation method" has been discussed as a "forward algorithm" in the publication "Speech Recognition by Probability Model", page 42, algorithm 3.2, by Seiichi Nakagava, first published on Jul. 1, 1988 by Electronic Information Telecommunication Society. As a result, an accumulated probability .alpha..sup.w.sub.t (n) to stay at the state n at a time t is initialised according to the following equation: ##EQU4##
By performing calculation sequentially from a time 2 to time T according to the following equation: ##EQU5## the word output probability P(a.sub.1, . . . , a.sub.T .linevert split.w) can be obtained as: EQU P(a.sub.1, . . . , a.sub.T .vertline.W)=.alpha..sub.T.sup.W (N.sub.w)(9)
Also, the Viterbi calculation method, in which a sum of all possible state transitions is approximated to the extent of the state transition providing the maximum probability, has been discussed in the above-identified publication on page 46, algorithm 3.4. In the "Viterbi calculation method", the following equation is employed in place of the foregoing equation (4): ##EQU6## In this case, by modifying the calculation for deriving the maximum value of the sun in the equation (8) in the forward calculation method, (other variables are the same) the equation is: ##EQU7##
By employing the "forward calculation method" or the "Viterbi calculation method", the reference pattern of the objective word for recognition can be calculated as a word output probability for outputting the feature vector sequence of the input signal. Then, recognition can be performed by selecting the word name having the maximum word output probability among all as the result of recognition.
In the HMM method set forth above, the reference pattern of the word is expressed as the state sequence having multi-dimensional gaussian distribution. Namely, assuming that the reference pattern of the word w is the state sequence N.sub.w of states, the input is divided into N.sub.w sections so that each section is modeled by one gaussian distribution. This is illustrated in FIG. 8. In FIG. 8, the horizontal axis represents a time of the input signal and the vertical axis represents a value a.sub.t of the feature vector. Here, the dimension of the feature vector is illustrated by taking a single dimension. .mu..sup.w.sub.n is an average vector of the gaussian distribution of the state n. .mu..sup.w.sub.n+1 represents the average vector of the gaussian distribution. In FIG. 8, by extracting such a state transition which stays at the state n from a time t.sub.n to time t.sub.n+1 and at the state n+1 from the time t.sub.n+1 to t.sub.n+2, the manner of matching of the reference pattern and the input signal is illustrated. As is clear from FIG. 8, HMM approximates the input signal in the sections corresponding to respective states to the average value and the distribution therearound. For instance, the smoothly varying portion of the input signal from the time t.sub.n to t.sub.n+1 is approximated by a constant average value .mu..sup.w.sub.n. In order to establish a model for a dynamically varying signal such as a voice signal, on the basis of a constant average value system (average vector) for each state, a large number of states are required. However, increasing the number of states results in increasing the number of parameters (average vectors of respective states and covariance matrix). Furthermore, in order to estimate the parameters with reliably high precision, a large amount of training data becomes necessary.