1. Field of the Invention
The present invention relates to distance calculating equipment for use in pattern recognition apparatus and, more particularly, to distance calculating equipment which have a low erroneous recognition rates for pattern recognition of voice, characters such as letters and digits even under a variety of conditions (or fluctuations).
2. Description of the Prior Art
In pattern recognition apparatus, the similarity between a pattern to be recognized, i.e. an input pattern, and each of a plurality of known reference patterns is calculated in terms of a concept of distance on the basis of a feature vector, which is composed of feature parameters intrinsic to respective patterns, and the reference pattern calculated to have the shortest distance is recognized as the pattern of the input pattern. In a word voice recognition apparatus, for example, a voice pattern is expressed as a time sequence of a feature vector (e.g., the spectrum information which is obtained by a spectrum analysis using a filter bank). Likewise, a sequence of feature vectors of reference patterns for each of a plurality of known words is achieved in advance, and a word having a reference pattern which is most similar to the input pattern is selected from those plural reference patterns by the distance calculation. At this time, it is necessary to correct the time fluctuations of the input pattern to fit the input pattern for the reference pattern by time expansion or compression. This correction of the time fluctuations is conducted by the so-called "time-normalized matching method". The time normalized matching method and voice recognition apparatus are disclosed in detail in the U.S. Pat. No. 3,816,722 assigned to the same assignee herein. As is well known in the art, the voice recognition apparatus is constructed of a voice analyzing unit, a reference pattern memory unit and a pattern matching unit. In the voice analyzing unit, spectrum analysis is performed by means of a filter bank, for example, and simultaneously a voice section is detected so that a sequence of n-dimensional feature vectors composed of an n number of feature parameters for each frame is obtained. On the other hand, a user speaks in advance words to be used and stores a sequence of similar feature vectors, which are given from the voice analyzing unit, as reference patterns in the reference pattern memory unit. The voice to be recognized is converted by the voice analyzing unit into a sequence of feature vectors, which are then fed as an input pattern to the pattern matching unit. In this pattern matching unit, the input pattern is matched with each of the reference patterns of a variety of words, which are stored in advance, so that the word class of the reference pattern, which is found to match best, is used as a recognition result.
The pattern matching is performed, as is detailed in the above-specified U.S. Patent, by the time normalized matching method, in which a non-linear expansion or compression is conducted along the time axis of the reference pattern. The distance between the feature vectors of the input pattern and the reference pattern is calculated and the sum value is used as a measure of similarity. An expansion or compression which minimizes the sum value can be determined by a dynamic programming method. The similarity is thus calculated on the basis of the distance between the feature vectors.
There are several ways of mathematically expressing the distance between feature vector A=(a.sub.1, a.sub.2, . . . , and a.sub.n) and feature vector B=(b.sub.1, b.sub.2, . . . , and b.sub.n). They include a Chebychev distance which is expressed by equation (1), a Euclid distance which is expressed by equations (2) and (3), and a correlated distance which is expressed by equation (4): ##EQU1##
Since a voice has minute fluctuations for each utterance, on the other hand, erroneous recognitions are increased if the reference pattern is determined in advance as a pattern. Two methods have been proposed to enlarge the recognition rate even where fluctuations are present. According to one method a plurality of reference patterns covering the aforementioned fluctuations are prepared for each word. According to the other method an average pattern is determined from the plural reference patterns and is used as a reference pattern representative of them. Reference should be made to, for example, "Speech Recognition by Machine: A Review", PROCEEDINGS OF THE IEEE, VOL. 64, No. 4, APRIL 1976, from pp. 501 to 531, written by D. RAJ REDDY, or "Practical Applications of Voice Input to Machine", PROCEEDINGS OF THE IEEE, VOL. 64, No. 4, APRIL 1976, from pp. 487 to 501, written by THOMAS B. MARTIN.
A disadvantage of the former method is that the storage capacity of the memory has to be increased and the number of distance calculations is increased because the number of the reference patterns to be prepared is increased.
On the other hand, a disadvantage of the latter method is that the recognition rate is lowered although the number of the reference patterns is not increased. Generally, pattern distributions of similar words are close to each other and each range of the pattern distribution has various (wider or narrower) ranges. As a result, if one average is used as the reference pattern, the wider range pattern of a word and the narrower range pattern of the similar word come close to each other so that the distance between the pattern to be recognized and the average pattern of the narrower pattern becomes smaller than that between the pattern to be recognized and the average pattern of the wider pattern thereby to cause erroneous recognitions in case the pattern to be recognized belongs to the wider patterns and is located in the vicinity of the ends of the same.
The discussion thus far described also applies to the case for recognitions of not only voices but also letters, digits and so on. In the case of the letter recognitions, the aforementioned fluctuations correspond to the positional deviations of the characters or to the differences in the characters intrinsic to the writer.