This invention relates to a device for recognizing by pattern matching an input pattern representative of input words which are substantially continuously spoken or uttered and whose sequence is determined in compliance with a finite state grammar. The pattern matching is carried out between the input pattern and a plurality of reference patterns representative of reference words, respectively, by resorting to a dynamic programming (DP) technique or algorithm.
A device for recognizing an input pattern representative of continuously spoken words, is usually called either a continuous speech recognition device or a connected word recognizing device and has a wide field of application. The continuously spoken words may, for example, be computer programs, sentences in business documents, directions for flight or navigation control, and instructions for various apparatus. It is known in principle that a high reliability is achieved in recognition of an input pattern obtained according to a finite state grammar when the pattern matching is restricted by rules of the finite state grammar. In a relatively simple case, errors are avoided in recognition of an input pattern when a rule is used as a restriction on the number of words of the input pattern in the manner revealed in U.S. Pat. No. 4,049,913 issued to Hiroaki Sakoe and assigned to the present assignee.
A considerable improvement is introduced to such continuous speech recognition devices by a method and an apparatus disclosed in U.S. patent application Ser. No. 719,603 previously filed Apr. 3, 1985, by Masao Watari, the present applicant, based on Japanese patent application No. 68,015 of 1984. The improvement is directed mainly to a system revealed in U.S. patent application Ser. No. 448,088 filed Dec. 9, 1982, by the above-named Hiroaki Sakoe based on Japanese patent application No. 199,098 of 1981. The system is for recognizing an input pattern which represents input words continuously spoken according to a finite state grammar. The input pattern has a certain input pattern length. More particularly, the continuously spoken words are represented as the input pattern by a sequence of input pattern feature vectors arranged along a first time axis at consecutive input pattern frame periods, resepectively. Each input pattern frame period is herein referred to simply as a frame. It is therefore possible to say that the input pattern length consists of a plurality of frames which are consecutively arranged along the first time axis. Although already issued as U.S. Pat. No. 4,555,796, the Sakoe patent application will be so referred to throughout the following for distinction from the first-cited Sakoe patent.
In the manner which will later be described a little more in detail, the improved apparatus of the previous Watari patent application is operable according to a slant-blockwise DP algorithm wherein each slant parallelogrammic block has a width which is equal to a predetermined number of the frames. The apparatus comprises memory means, concatenating means, matching means, and deciding means.
The memory means is for memorizing first through N-th reference patterns representative of first through N-th reference words, respectively, where N represents a predetermined natural number. An n-th one of the reference patterns has an n-th reference pattern length where n represents each of one through N. The reference pattern lengths of the respective reference patterns are measured in terms of the frames in the manner which will later become clear.
The concatenating means is for concatenating the reference patterns into a plurality of concatenations. Each concatenation consists of selected reference patterns which are selected from the first through the N-th reference patterns according to the grammar and are arranged along a second time axis. It is possible to understand without loss of generality that the second time axis is orthogonal to the first time axis.
The matching means is for pattern matching the input pattern with the concatenations in slant parallelogrammic blocks to provide dissimilarity measures between the input pattern and the respective concatenations. Each block has a predetermined slope relative to the first time axis and a width and a height which are parallel to the first and the second time axes and equal to a selected number of the frames and to the reference pattern length of each selected reference pattern of each concatenation.
The deciding means is for deciding a minimum of the dissimilarity measures to recognize the input pattern as one of the concatenations that is pattern matched to the input pattern to provide the minimum of the dissimilarity measures.
More specifically, the selected number should not be longer than a quotient which is equal to the predetermined slope under a minimum of the first through the N-th reference pattern lengths. The blocks therefore have widths which are restricted to a narrow width by the minimum reference pattern length.
On the other hand, the pattern matching is carried out by accessing various memories a number of times which are reversely proportional to the widths of the blocks. In other words, the apparatus is operable at a speed which is reversely proportional to the block width. If only one of the reference pattern lengths is considerably short, the speed becomes slow. The speed must be raised by the use of high-speed memory elements as the memories. The apparatus becomes bulky and expensive.