This invention relates to a continuous speech recognition system for automatically recognizing continuous speech, namely, continuously spoken one or more words.
A continuous speech recognition system has various merits as a device for supplying data and/or programs to an electronic digital computer and a device for supplying control data to various apparatus. Speech recognition has been approached in various ways. The simplest and most effective way is to resort to pattern matching. According to the pattern matching applied to recognition of a single word, standard or reference patterns are provided, one for each word of a vocabulary to be recognized. Comparison is made beween an unknown pattern of an input speech or voice signal (briefly called an input pattern) and each reference pattern to derive a quantity representative of a degree of similarity or a similarity measure between the two compared patterns. The input pattern is recognized to be the reference pattern that provides a maximum of the similarity measures calculated for the respective reference patterns. The pattern matching, however, is not directly applicable to recognition of continuous speech. This is because it is difficult prior to recognition of the respective words to optimally segment the continuous speech into word units, namely, to decide a point of segmentation between each pair of two consecutive words as, for example, by detecting variations in amplitude and/or pitch of the speech.
A pattern recognition system applicable to recognition of continuous speech is revealed in U.S. Pat. No. 3,816,722 issued to Hiroaki SAKOE, the present applicant, and Seibi CHIBA. An improvement in the continuous speech recognition system is described in a prior patent application filed by the instant applicant (Ser. No. 665,759 filed Mar. 11, 1976, now abandoned, in the United States; Application No. 1,009 of 1976 in the United Kingdom; No. P 26 10 439.2 in Germany; and Application No. 7602579 in the Netherlands). According to the prior patent application, pattern matching is carried out between an input pattern as a whole and concatenated reference patterns obtained by concatenation of the reference patterns in all possible permutations with repetition allowed. Decision is carried out by finding those number of words and concatenations of the reference patterns which render the similarity measure maximum as a whole. With this sytem, it is unnecessary to preliminarily segment the input pattern into word units. In practice, the maximum is found in two steps, one on the word basis and the other for the whole. It is possible to apply dynamic programming in finding out each maximum to achieve a practical speed of recognition. Although quite effective, the system according to the prior patent application is still defective in that the system is liable in specific cases to misrecognize the number of words and consequently misrecognize the whole input pattern. For example, an input pattern of two words might be misrecognized to be a pattern of three words and vice versa.