1) Field of the Invention
This invention relates to a speech recognition system and, in particular, to a device for producing a reference pattern for use in the system.
2) Description of the Prior Art
In speech recognition systems, a speech signal having a pattern is analyzed by a feature analyzer to produce a time sequence of feature vectors. The time sequence of feature vectors is compared with reference patterns and is thereby identified as one of the reference patterns.
Considering variation of the pattern of the speech signal due to a plurality of utterances, the reference pattern is generated from a number of training speeches.
One of the known speech recognition systems has a table memorizing a plurality of code vectors and a plurality of feature codes corresponding thereto for vector quantizing the time sequence of feature vectors. For example, such a speech recognition system using the table is described in an article contributed by S. E. Levinson, L. R. Rabiner, and M. M. Sondhi to the Bell System Technical Journal, Volume 62, No. 4 (April 1983), pages 1035 to 1074, under the title of "An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition".
According to the Levinson et al article, the speech recognition system comprises the code vector table for memorizing a plurality of code vectors and a plurality of feature codes corresponding thereto.
On generating the reference pattern, a plurality of speech signals are used which are produced by a plurality of utterances and are representative of the predetermined input pattern with variations. Connected to the feature analyzer and to the code vector table, a converter is used in converting the plurality of feature vector time sequences into a plurality of time sequences of feature codes, respectively, with reference to the code vectors. A forming circuit is connected to the converter and has a state transition network or table.
The state transition network has a plurality of states which vary from one to another with a state transition probability in accordance with time elapsing. Therefore, for the feature code time sequences, the feature codes appear in each state in the state transition network. When attention is directed to a particular code among the feature codes, the particular code has a probability of occurrence in each state in the transition network.
The forming circuit is responsive to the feature code time sequences and calculates the state transition probability distribution and the occurrence probability distribution of the feature codes for each state to generate a reference pattern comprising both probability distributions.
In the Levinson et al speech recognition system, the reference pattern is generated in this manner in response to each predetermined input pattern by a reference pattern generating device which comprises the code vector table, the converter, and the forming circuit. The reference pattern generating device is rapidly operable because the reference pattern can be obtained with relatively little calculation processing. The reference pattern is, however, liable to cause erroneous speech recognition because of quantizing error.
Another speech recognition system is disclosed in U.S. Pat. No. 4,783,804 issued to Biing-Hwan Juang et al. According to the Juang et al patent, a reference pattern generating device comprises a speech analyzer and a function generator. The speech analyzer produces a plurality of feature vector time sequences representative of a predetermined input pattern of a plurality of varieties. A function generator is coupled to the speech analyzer and calculates, in response to the feature vector time sequences, a state transition probability distribution in the state transition network and a probability density function by which it is possible to approximate a probability distribution of occurrence of the feature vectors for each state. The function generator generates a reference pattern in response to the state transition probability distribution and the probability density function.
The Juang et al reference pattern generating device can generate the reference pattern which enables speech recognition with reduced error because no vector quantization is used. The device is, however, incapable of rapidly generating the reference pattern because the processing is increased for calculating the reference pattern.