1. Field of the Invention apparatus using the Hidden Markov Model (HMM) in pattern recognition of voice or the like, and more particularly relates to an apparatus which is capable of fabricating HMM high in precision of recognition.
2. Description of the Prior Art
FIG. 1 is a block diagram of a conventional voice recognition apparatus employing HMM. In the diagram, numeral 101 denotes a voice analyzing part, in which an input voice signal is converted into a feature vector for a specified time interval (called a frame), for example, every 10 msec, by a well-known method such as, filter bank, Fourier transform and LPC analysis. Therefore, the input voice signal is converted into a sequence X=x.sub.1, x.sub.2, . . . , x.sub.T of feature vectors, in which T is the number of frames. Numeral 102 is called a code book, which holds a labeled representative vector. Numeral 103 is a vector quantizing part, which replaces the individual vectors in the vector sequence X with the label of the closest representative vector. Numeral 104 is an HMM fabricating part, in which an HMM corresponding to each word in the recognition vocabulary is created from training data. That is, in order to make an HMM corresponding to word n, the structure of the HMM (the number of states or the transition rule permitted between states) is first properly determined, and from the label sequence obtained by multiple utterances of the word n in the above method, the probability of the label occurring along with the transition of the state or the state transition probability in the previous model is estimated so that the probability of the occurrence of the label sequence may be estimated. Numeral 105 is an HMM memory unit, which stores the thus obtained HMM in each word. Numeral 106 denotes a probability calculating part, which calculates the probability generated by the label sequence from the respective models stored in the HMM memory unit 105, with respect to the label sequence of the unknown input voice to be recognized. Numeral 107 is a comparator, which judges the word corresponding to the model giving the maximum value of the likelihood corresponding to the respective models obtained in the likelihood calculating part 106, as the result of recognition.
The recognition by HMM is effected in the following manner. Supposing the label sequence obtained with respect to an unknown input to be o=o.sub.1, o.sub.2, . . . , o.sub.T, and an arbitrary state sequence of a length T generated by model .lambda..sup.w to be S=s.sub.1, s.sub.2, . . . , s.sub.T, the probability that the label sequence o was generated from .lambda..sup.w is given as follows: ##EQU1## In the case of equation (2), using the logarithm, it is often rewritten as ##EQU2## where P(x,y.vertline..lambda..sup.w) is a simultaneous probability of x, y in model .lambda..sup.w.
Therefore, for example, by using equation (1), supposing ##EQU3## w is the result of recognition. The same holds true when equations (2) and (3) are employed.
P(o,S.vertline..lambda.) is, in the case of equation (1), obtained as follows.
With respect to the state qi (i=1.about.I) of HMM.lambda., when the occurrence probability b.sub.i (o) of label o and the transition probability a.sub.ij from state q.sub.i to state q.sub.j are given in every state q.sub.i, the probability of occurrence of label sequence O=o.sub.1 o.sub.2, . . . o.sub.T from HMM.lambda. with respect to the state sequence S=s.sub.1, s.sub.2, . . . ,s.sub.T+1 is defined as follows: ##EQU4## where as is the initial probability of state s.sub.1, and s.sub.T+1 =q.sub.r is the final state, where any label is not generated at all.
In this example, the input feature vector x is converted into a label, but it is also possible to use the feature vector x directly instead of the occurrence probability of the label in each state, and give the probability density function of the feature vector x. At this time, instead of the occurrence probability b.sub.i (o) in the state q.sub.i of the label o in equation (5), the probability density b.sub.i (x) of the feature vector x is used. Therefore, the equations (1), (2), and (3) are rewritten as follows: ##EQU5## and equation (2') may be rewritten as follows by using the logarithm as in the case gf equation (2). ##EQU6##
The typical HMM used in voice recognition hitherto is as shown in FIG. 2. In the diagram, q.sub.i denotes an i-th state, a.sub.ij is the transition probability of changing from state q.sub.i to state q.sub.j, and b.sub.i (x) is the probability or probability density observed in the state q.sub.i of the label or feature vector x. Hereinafter, x is a vector having a continuous value.
At this time, the state q.sub.i of HMM is considered to correspond to the partial segment i of the voice corresponding to its HMM. Therefore, the probability density b.sub.i (x) of observation of x in state q.sub.i is the probability density at which the x occurs in the segment i, and the transition probability a.sub.ii is understood as the probability of x.sub.t+1 to be contained in segment i again at time t+1 when x.sub.t at time t is contained in segment i. From such point of view, the following two points may be indicated as the problems in the conventional HMM.
(1) Since the parameter for defining the function b.sub.i (x) is constant with respect to the state q.sub.i, each segment may be regarded to be constant in the probability distribution in its interval. Therefore, depending on the phoneme, although the feature of the time-wise change (dynamic feature) of the feature vector is important, the feature cannot be expressed adequately in the conventional model.
(2) The length .tau. of the segment is considered to conform to a certain probability distribution, but in the conventional model, since the transition probabilities a.sub.ii, a.sub.ij are constant regardless of the length of the sequence of state q.sub.i, the length of the segment i follows the exponential distribution substantially and the distribution profile does not always express the reality correctly.
Concerning these two problems, as for point (2), it is already known to use the Poisson's distribution or .GAMMA.-distribution as to the probability density function d.sub.i (.tau.) relating to the length I of the sequence of the state qi.