The present invention relates to an improvement in a continuous speech recognition apparatus for recognizing continuous speech which is uttered continuously.
As a conventional method of recognizing continuously uttered speech in accordance with a predetermined grammar, a method described in a paper entitled "Structural Methods in Automatic Speech Recognition" (by Stephen E. Levinson, Proceeding of the IEEE, Vol. 73, No. 11, Nov. 1985, pp. 1625-1650) is known (hereinafter referred to as "reference 1"). In the above method, continuous speech is recognized by Dynamic Programming (DP) matching on the basis of standard patterns in units of words which are coupled according to a finite-state automaton representing the regular grammar. According to this method, continuous speech can be recognized by an appropriate calculation amount. As another method of recognizing continuous speech according to the finite-state automaton, a method of using a "Hidden Markov Model" (hereinafter to be referred to as an "HMM") described in "D. PARSING, D1. Overview of Parsing Techniques" (The Handbook of Artificial Intelligence, Vol. I, edited by A. Barr et al., Heuris Tech Press, pp. 256-262) is also known (herinafter referred to as "reference 2"). Continuous speech recognition can be realized by using a Viterbi algorithm, as described on Page 46 of reference 2.
A case will be described below wherein continuous speech is recognized by frame-synchronization DP matching using the grammar expressed by the finite-state automaton described in the reference 1. The basic processing sequence of the method using the HMM described in the reference 2 is the same as that in the reference 1, and can be executed in the same manner as in the reference 1. A case will be described below wherein a word is used as a recognition unit. However, a unit other than a word, e.g., a phoneme may be used, as a matter of course.
An input speech pattern (input pattern) can be expressed by a time series of features: EQU A=a.sub.1,a.sub.2,...a.sub.i,...a.sub.I (1)
If a word to be recognized is represented by n, a standard pattern can be expressed by: EQU B.sub.n =b.sub.n1,b.sub.n2, ..b.sub.nj,...b.sub.nJn (2)
A distance between a feature a.sub.i of the input pattern and a feature b.sub.nj of the standard pattern is given by d(n;i,j). In word-level processing, a DP recurrence formula for the following accumulation value g is solved to calculate an inter-word distance. At the same time, a path value L is calculated to back-trace a recognition result obtained when continuous speech recognition is performed. ##EQU1##
[x] in formulas (5) represents that the corresponding formula is selected when an xth accumulation value in the processing of min[] in formula (4) is the minimum. An inter-word distance in a frame i of the input pattern is obtained as g(n;i,Jn). A frame at the start point of the input pattern with respect to the standard pattern at that time is obtained as the path length L(n;i,Jn). In formulas (3), 0 is given as an initial value of the accumulation value. However, in sentence-level processing, if an accumulation value of an immediately preceding word is given according to the finite-state automaton, and a word-level recognition result is preserved, continuous speech recognition can be performed.
The regular grammar is insufficient to process complexity of a natural language, and it is preferable to use a context-free grammar which has a higher power of expression. In the context-free grammar, the left-hand side of a generation rule consists of one nonterminal symbol, as described in "C. GRAMMARS, C1. Formal Grammars" (The Handbook of Artificial Intelligence, Vol. I edited by A. Barr et al., Heuris Tech Press, pp. 239-244) (to be referred to as a reference 3 hereinafter). For example, the context-free grammar for generating a certain sentence will be exemplified below. ##EQU2## Thus, a sentence "boys eat apples" can be generated. The context-free grammar has a characteristic feature capable of using a recurrent generation rule as compared to the regular grammar.
A method of recognizing continuous speech using the context-free grammar is already available. For example, a CYK method, an Earley method, and the like are described on p. 128 and subsequent pages in the reference 2.
When continuous speech is to be recognized, in the method using the finite-state automaton described in the reference 1, a grammar to be expressed is limited to the regular grammar. When the context-free grammar is developed to the finite-state automaton, if a generation rule includes the following recurrent expressions, a network is infinitely generated, and such expressions cannot be processed: EQU S.fwdarw.aSb EQU S.fwdarw.ab
In this manner, in the conventional method using the finite-state automaton, the context-free grammar having a higher power of expression cannot be used to express a natural language.
According to the CYK method or the Earley method described in the reference 2, the context-free grammar can be processed. However, the amount of calculation required is considerably large.