1. Field of the Invention
The present invention relates to continuous speech recognition devices. More particularly, the present invention relates to a continuous speech recognition device that unifies speech recognition and language processing by using a LR table for predicting input speech data to verify the prediction with the phoneme verification function of a HMM phoneme recognition device.
2. Description of the Background Art
A HMM-LR method is known as a continuous speech recognition method for carrying out speech recognition and language processing efficiently in a single integrated process. This method carries out efficient processing of high reliability without the means of intermediate data such as phrase lattice by managing speech recognition and language processing in unification which was carried out separately before.
A probability of phoneme predicted by a language processing method called the HMM-LR method (LR parser) is calculated by a speech recognition method called the HMM (Hidden Markov Model) method in the HMM-LR method. The LR method and the HMM method will be explained prior to the explanation of the HMM-LR method.
In the field of computational geometry, particularly in the processing system of programming language, techniques of syntax analysis are studied profoundly. One such method is called the LR parser. This LR parser is a type of the so-called SHIFT-REDUCE type parser where analysis is carried out by reading the input symbol from left to right. The LR parser holds internally a "state" to determine the next action to be taken according to the current state and the input symbol. The following four actions are allowed in the LR parser:
(1) ACCEPT PA1 (2) ERROR PA1 (3) SHIFT PA1 (4) REDUCE PA1 N: Length of symbol string of unknown phoneme data PA1 Oi: The i-th symbol in the unknown phoneme data symbol string PA1 M: The number of states of the verified phoneme HMM PA1 a (i, j): Transition probability of the ellipsis connecting state i and state j in the verified phoneme HMM PA1 b (i, j, k): The probability of the ellipsis connecting state i and state j in the verified phoneme HMM to provide symbol k PA1 P (0, 0)=1.0 PA1 P (0, j)=1.0e.sup.-.infin. (j=1 . . . M) PA1 P (0, j)=1.0e.sup.-.infin. (j=1 . . . N) PA1 P (i, j)=P (i-1, j).times.a (j, j).times.b (j, j, Oi) PA1 +P (i-1, j-1).times.a (j-1, j).times.b (j-1, j, Oi) PA1 Q(i)=P (i, M) (i=1 . . . N) PA1 (1) A state stack of the LR parser PA1 (2) The value of probability table Q(1) . . . Q(N) calculated in the prior phoneme verification. PA1 Q(0)=1.0 PA1 Q(i)=1.0e.sup.-.infin. (i=1 . . . N) PA1 P (0, j)=1.0e.sup.-.infin. (j=1 . . . M') PA1 P (i, 0)=Q(i) (i=1 . . . N) PA1 P (i, j)=P (i-1, j).times.a (j, j).times.b (j, j, Oi) PA1 +P (i-1, j-1).times.a (j-1, j).times.b (j-1, j, Oi) PA1 (i=1 . . . N, j=1 . . . M') PA1 Q(i)=P (i, M') (i=1 . . . N)
ACCEPT indicates the reception of an input symbol string in the LR parser. ERROR indicates that the input symbol string is not received in the LR parser. SHIFT accumulates the current input symbol read by the LR parser and the current state on a stack. REDUCE reduces the topmost symbol in the stack to a greater unit using a grammar rule. In REDUCE, the state symbols and the input symbols are removed from the stack by the number of grammar rules in the right-hand side of the grammar rule used.
A list called the LR table is referred to for determining the action of the LR parser from the current state and the input symbol. A LR table must be prepared in advance for analysis with the LR parser. The LR table can be implemented mechanically from a grammar rule.
FIG. 4 shows an example of a grammar rule, and FIG. 5 shows an example where the grammar rule of FIG. 4 is converted into a LR table.
It can be appreciated from FIG. 5 that the LR table is formed of two tables called the ACTION table and the GOTO table. The states of the LR parser are indicated along the ordinate and input symbols are arranged along the abscissa in the ACTION table, with the action to be taken by the LR parser denoted in each segment of the table. Referring to FIG. 5, the action denoted "acc" indicates ACCEPT, and the empty space in the table indicates ERROR. The symbols with a prefix s indicates SHIFT. The number following s indicates the state to be taken by the LR table after the SHIFT action. The symbols with a prefix r indicates REDUCE. The number n following r indicates the execution of a reduce action using the n-th grammar rule.
The LR parser refers to the GOTO table after the REDUCE action. The states of the LR parser are indicated along the ordinate, with non-terminal symbols shown along the abscissa in the GOTO table. The LR parser determines a new state by the GOTO table from the non-terminal symbol obtained from the REDUCE action and the current state. The state of the LR parser at the initiation of the analysis is 0. The analysis terminates with the LR parser carrying out an ACCEPT action to receive an input symbol string, or carrying out ERROR action to not receive an input symbol string.
A recognition processing method called the HMM method regarding utterance as a probablistic state transition is known in the field of speech recognition.
FIG. 6 is a typical phoneme model diagram used in the HMM method. The method of phoneme recognition by HMM will be explained hereinafter with reference to FIG. 6. The probability of transition between states and the value of the output probability of a symbol are given in each ellipsis of the HMM, whereby a probablistic symbol string is provided according to these values. In speech recognition by the HMM method, a number of HMMs according to the number of the phoneme types are prepared in advance. A probability of the phoneme HMM is obtained which provides a symbol string of training phoneme data at the highest probability from training phoneme data, whereby the probability of the output of an unknown phoneme data symbol string from the whole HMM is calculated to establish the phoneme corresponding to the HMM having the highest probability as the recognition result.
The manipulation of calculating the probability of the unknown phoneme data is called phoneme verification. This operation is carried out for the HMM of FIG. 6 by the following procedures.
(Definitions of Symbols)
(Initialization)
(Recursion Calculation (i=1 . . . N, j=1 . . . M))
The result of the phoneme verification is given in probability table Q(1) . . . Q(N).
Hence, the HMM-LR method is a method of unifying the LR method and the HMM method in performing analysis. The HMM-LR method calculates the probability of the predicted phoneme by predicting the phoneme in the uttered speech data to actuate HMM phoneme verification. This allows simultaneous performance of speech recognition and language processing. Efficient processing of high reliability can be carried out without the means of intermediate data which serve to unify speech recognition and language processing. The HMM-LR method will simply be called parser hereinafter.
The parser grows simultaneously various potential parsing trees. A parsing tree represents a sentence as a string of words, where the relationships thereof are illustrated in the form of a tree. A parsing tree is supplied with a value indicating the probability of that parsing tree to be received. The parsing tree is regarded not worthy of being grown and is rejected when the probability value becomes lower than a predetermined threshold value. The parser comprises a plurality of regions for storing information associated with the currently grown parsing tree. This region is called a cell hereinafter. One parsing tree corresponds to one cell. A cell corresponding to an already received parsing tree is called an active cell. The information stored in the cell includes the following:
N is the length of the symbol string corresponding to the input phoneme data.
At the initiation of the analysis, only one cell C exists, with state 0 provided at the topmost state stack of the LR parser of the only one cell C. The following values are provided as initial values in the probability table Q of this cell Q.
The parser then selects one active cell and reads state s from the topmost step of the LR state stack of that cell to look into the action table corresponding to state s in the LR table. When the selected action is SHIFT, an input symbol A to be SHIFTed is HMM phoneme-verified to update the values in the probability table of the cell as below.
(Recursion calculation)
where M' is the number of states of symbol A in the HMM.
If Q(i) having the highest probability value in the probability table Q(1) . . . Q(N) updated by the above calculation is smaller than the threshold value, this cell is discarded. Otherwise, a new state is accumulated upon the LR state stack.
When the selected action is REDUCE, a reduce action by the grammar rule will be executed. This action is identical to that of a normal LR parser. When the selected action is ACCEPT, the analysis will be completed after all the input phoneme data are processed.
There is a method of describing a phoneme model used in speech recognition in integration by environment information around the phoneme. This is called phoneme environment clustering (PEC). This method derives a cluster of environment dependent phonemes by minimizing the total distortion amount in the mapping of phoneme pattern space and phoneme environment space. Phoneme context, pitch, power, speaker, utterance speed, and language are some of the factors of the phoneme environment. The information of phoneme context is considered to be particularly critical for the phoneme environment. Particularly in the HMM-LR method where the phoneme context is already known, recognition of high precision can be expected using a phoneme model of high phoneme separation obtained by PEC.
If phoneme context is taken as the factor of phoneme environment, the phoneme model determined by phoneme environment clustering results in a model dependent upon phoneme context. Therefore, the parser must carry out actions dependent upon phoneme context in order to carry out continuous speech recognition using this phoneme model. However, the conventional LR parser of the HMM-LR method could not carry out actions according to the phoneme context, so that the above-described phoneme context dependent type phoneme model could not be used.