1. Field of the Invention
The present invention relates to a speech recognition system which automatically recognizes input speech, and more particularly to a speech recognition system which recognizes a continuous input speech signal in real time.
2. Description of the Prior Art
In a usual and practical speech recognition system, an input speech signal is analyzed and compared with preregistered standard patterns and the input signal is discriminated based on a detected degree of match. When words are used as the standard patterns, a word recognition system is provided, and the words which can be recognized can be changed by exchanging the standard patterns. When phonemes having a higher degree of freedom are used as the standard patterns, a phoneme recognition system is provided which provides a series of phonemes constituting speech.
In such a system, however, when the number of items to be recognized is large, much time is required in matching with the standard patterns. In order to overcome the above problem, matching circuits may be added to carry out parallel processing to reduce the matching time. However, it leads to an increase in the circuit scale because no time delay is permitted so long as the real time speech signal is handled. Another system for overcoming the above problem is a hierarchy recognition system as disclosed in Japanese Utility Model Application Nos. 54-91283 and 54-121819. For example, in the phoneme recognition system, there are approximately 110 syllables which constitute the phonemes of Japanese, such as [a], [i], . . . [n], . . . [gyo], . . . . They consist of combinations of consonants and vowels or only vowels. Seven vowels [a], [i], [u], [e], [o], a syllabic nasal [n] and a pause are included. In a first layer processing, the vowels which can be relatively stably recognized are recognized. In a second layer processing, the consonants between the vowels are recognized. Approximately seventeen consonants [p, t, k], [b, d, g], [s], [z], [h], [w, r, j], [m, n] , an assimilated sound, a contracted sound and on consonant are included. Since they are located between the vowels, they can be recognized fast and stably. However, a problem encountered here is that the speech data must be retroactively processed in the second layer processing. Because the real time input speech signal is a time serial signal, the signal to be processed in the second layer processing is no longer present. If a general purpose computer is used, a batch processing in which the signals are temporarily stored in an internal memory or a magnetic tape and are subsequently read out section by section for processing can be used, but it is not possible to attain a real time speech recognition system which can process sequential input speech signals in real time.
In one prior art speech recognition system, a speech signal generated from a microphone or a tape recorder is converted by an analyzer to analysis parameters, frame by frame, and an input pattern is compared with standard patterns in a discriminator and one of the standard patterns which has the highest degree of match is selected as a discrimination output. However, when the number of standard patterns is large or the hierachy matching system is used, a frame of data may arrive before the compare operation for the data of the previous frame has been completed.