This invention relates to a device for recognizing an input pattern with reference to a predetermined number of reference patterns. Although equally well applicable to recognition of various patterns, such as type-printed or hand-printed letters, a device according to this invention will be described in the following mainly in connection with a speech recognition device.
A device for recognizing continuous speech sounds of one or more actually spoken words and for encoding the result of recognition is advantageous as a device for supplying data and/or program words to an electric digital computer and a device for supplying control data to various apparatus. The reasons are as follows. First, the input operation may be carried out by any untrained person because it is only necessary to pronounce the input data rather than manipulating a keyboard or a like facility. Secondly, the input operation is possible with hands and feet used in accomplishing other purposes. In the third place, the input data may be supplied to the computer or the like even from a remote location merely through an ordinary telephone network. Because of these merits, the speech recognition devices are widely in demand and have been developed at various places in the world into practical use.
In a speech recognition device, it is preferred in general to carry out pattern matching with the technique of dynamic programming resorted to as described in, for example, U.S. Pat. No. 3,816,722 issued to Hiroaki Sakoe and Seibi Chiba, assignors to the present assignee. In a speech recognition device of this type, speech sound is subjected to spectrum analysis, sampling, and digitization to be transformed into a time sequence of vectors representative of features of the speech sound at the respective sampling instants (hereafter referred to as a time sequence of feature vectors). The speech sound is representative of one or more continuously spoken words of a preselected vocabulary. The time sequence is representative of a speech sound pattern of the continuously spoken word or words. Prior to recognition of each speech sound pattern supplied to the device (hereafter named an input pattern), which is unknown to the device, at least one standard speech sound pattern for each word of the vocabulary (hereafter called at least one reference pattern) is supplied to the device and memorized therein. Comparison, namely, pattern matching, is carried out between the input pattern and every reference pattern with the dynamic programming technique resorted to. One of the reference patterns that is most similar to the input pattern is selected. The word represented by the selected reference pattern gives the result of recognition.
It is to be noted here that the input pattern is subject to a complicated and nonlinear deformation as regards the time axis as a result of variations in the speed of utterance as pointed out in the above-cited patent and also in an article contributed by Hiroaki Sakoe and Seibi Chiba to IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-26, No. 1 (February 1978), pages 43-49, under the title of "Dynamic Programming Algorithm Optimization for Spoken Word Recognition." Optimum pattern matching is achieved only after nonlinearly compensating for fluctuations or shifts between the time axes of the input pattern and the respective reference patterns. As will be discussed more in detail in the following, a considerable amount of calculation is necessary even with application thereto of the dynamic programming technique. High-priced calculators are indispensable in accomplishing the calculation within a reasonable interval of time.
Accurate or reliable speech recognition devices are therefore expensive. Low-priced ones are objectionable as to their performance. Conventional speech recognition devices are thus still defective with respect to the performance-to-price ratio.