This invention relates to a continuous speech recognition system for automatically recognizing continuous speech sound of continuously spoken word or words. A system according to this invention is specifically adapted, although not exclusively, to recognition of at least two input words continuously spoken according to a format.
A continuous speech recognition system is advantageous for use as a device for supplying data and/or program words to an electronic digital computer and a device for supplying control data to various apparatus. It has, however, been difficult with a sophisticated speech recognition system to recognize continuously spoken words. For correct recognition, data and/or program words or control data must be pronounced or uttered word by word. On supplying a computer with, for example, data consisting of a plurality of digits, it was necessary to pronounce the data on a digit by digit basis. Consequently, the sophisticated system was slow in operation and inconvenient for users.
Speech recognition has been approached in various ways. The simplest and most effective way is to resort to the technique of pattern matching. According to the pattern matching technique applied to recognition of a discrete input word, a vocabulary consisting of a plurality of reference words is selected. The input word should be one of the reference words. Master or reference patterns are provided with the reference words individually spoken, each reference word in at least one manner of pronunciation. An appreciable number of reference patterns are thus used to represent the reference words of the vocabulary. Comparison or pattern matching is carried out between a pattern of input speech sound (hereafter called an input pattern) of the input word to be recognized by the system and every reference pattern. For each reference pattern, a quantity is derived as a result of comparison, which quantity represents a degree of likelihood or similarity (hereafter referred to as a similarity measure) between the input pattern and the reference pattern under consideration. The input pattern is recognized to be the reference word, the reference pattern provided for which gives a maximum of the similarity measures derived for the respective reference patterns. In this manner, it is possible with the system to recognize an input pattern representative of any other reference word in the vocabulary by the use of the reference patterns.
In U.S. Pat. No. 4,059,725 (United Kingdom Patent Application No. 1,009 of 1976) issued to the present applicant, assignor to the instant assignee, a much improved continuous speech recognition system is disclosed wherein the pattern matching technique is carried out between an input pattern as a whole and a plurality of reference pattern concatenations obtained by concatenating reference patterns of all allowable or possible numbers. A concatenation of certain pattern is a permutation with repetition of the patterns in question. Sums of what may be named partial similarity measures are calculated as a result of comparison of the whole input pattern with the reference pattern concatenations. Decision is made by finding that number of words and that concatenation of reference pattern or patterns which give a maximum of the partial similarity measure sums. In practice, the maximum partial similarity measure sum is found in two steps, on the word basis at first and then for the whole. It is possible to apply the technique of dynamic programming to finding out the maximum in each step to reduce the amount of calculation and thereby to raise the speed of recognition.
In U.S. Pat. No. 4,049,913 (United Kingdom Patent Application No. 44,643 of 1976) issued also to the present applicant and assigned to the instant assignee, another improved continuous speech recognition system is revealed wherein the above-described pattern matching technique and the decision process are carried out with the number of word or words preliminarily specified either by a single integer or by a set of integers. This system is very accurately operable in specific fields of application. This system and the system disclosed in U.S. Pat. No. 4,059,725 are believed to be the best available continuous speech recognition systems at present.
It is a tendency common to practical continuous speech recognition systems that misrecognition is liable to occur more or less. This is because the speech sound supplied to the system for recognition is not always completely congruent in timbre and other respects with the speech sound used to provide the reference patterns. It is therefore very important to avoid the possible misrecognition even at a cost of some restrictions on the part of users in speaking the input word or words. The restriction or restrictions, however, should not reduce the speed of operation and should be tolerable in practice on using the system. An example of a tolerable restriction is to preliminarily specify the number of input word or words as is the case with the system disclosed in U.S. Pat. No. 4,049,913.