1. Field of the Invention
This invention generally relates to a method and system for comparing an unknown pattern with a plurality of known patterns to determine the identity of the unknown pattern, and, in particular, to a pattern recognition method and system. More specifically, the present invention relates to a method and system for recognizing a pattern, such as a voice pattern, which is particularly useful for application to voice and character recognition
2. Description of the Prior Art
One of the most common pattern comparing method is the one utilizing pattern matching In accordance with the pattern matching method, the degree of similarity between an input, unknown pattern and each of a plurality of registered, known reference patterns is determined, and, then, the input pattern is identified by one of the reference patterns having the highest degree of similarity. When this method is actually used, there is a problem of the surrounding noise being mixed with an input pattern. For example, in the case of applications to voice recognition, when a sporadic noise has been produced in the background during a recognition process, or when the opening and closing sound of the mouth has been introduced in a voice during pronunciation, an input voice is compared with reference patterns with the presence of noise, so that a proper similarity cannot be determined. As described above, if it is very sensitive to noise, there occurs a problem of addition of noise into an input voice; on the other hand, if the sensitivity to noise is lowered, it is true that the chance of picking up noise is reduced, but there arises another problem of failure to sample part of a voice. For example, in the case of a word "stop" wherein the word end is a consonant which is pronounced by itself, it is often the case that the last sound /p/ fails to be detected.
FIGS. 7a and 7b illustrate the case where the word voice for "stop" is to be recognized by pattern matching. FIG. 7a illustrates a reference pattern and FIG. 7b illustrates an input pattern to be identified. The reference pattern of FIG. 7a has an accurate pattern representing the word "stop" in its entirety; however, the input pattern of FIG. 7b lacks the last sound /p/. As a result, when these two patterns are compared by pattern matching, the pattern end /p/ of the reference pattern of FIG. 7a is made to correspond to the pattern end /o/ of the input pattern of FIG. 7b. For this reason, the degree of similarity between these two patterns becomes lower, thereby leading to an erroneous result of recognition. Such an erroneous correspondence may be prevented from occurring by using the dynamic matching scheme with a free end point; however, there are also cases in which a portion of a pattern at its head end or tail end is lost or a noise is added, which would necessarily increase the amount of calculations, though this method inherently requires a large amount of calculations.
As described above, in accordance with the pattern matching method, the degree of similarity is determined by comparing an input, unknown pattern with each of a plurality of reference patterns which have been formed in some way before-hand, and then the identity of the input pattern is determined by one of the reference patterns which has the highest degree of similarity. Thus, it is extremely important in the pattern matching scheme how accurately a pattern can be extracted and this importance is not limited only to the application to voice recognition. In particular, in order to prevent the surrounding noise from being introduced when a voice is extracted or when a voice interval is to be determined, a threshold level must be properly determined so as not to pick up a small noisy sound.
One typical method of detecting a voice interval is illustrated in FIG. 8, in which the energy level of a voice is used to separate the voice from the background noise. In accordance with this method, a voice energy threshold level A for cutting noises has been determined prior to inputting of a voice, and a voice interval L is determined as a time period from a point in time t.sub.1 when the voice energy level has exceeded the threshold level A to a point in time t.sub.2 when the voice energy level has decreased below the threshold level A. This is the basic idea in detecting a voice interval, but various improvements have also been made so as to separate a voice of interest from noises. In addition, as a feature quantity, it is not necessary to use a specific one, and use may be made of any desired feature, most typically such as power spectrum, LPC or cepstrum Taking the case of power spectrum as an example, this can be implemented by applying an input voice to a band-pass filter bank, and the method of analysis can be freely varied by selecting the characteristic of the band-pass filter bank. A voice interval detecting unit and a feature quantity converting unit may be disposed in any order in relation to the flow of a voice signal supplied from a voice input unit. In such a case, if a noisy consonant small in energy, such as sound /f/ is present at the beginning or end of a voice, it is very difficult to pick up. FIG. 9a illustrates a reference pattern for /family/ and FIG. 9b illustrates an input pattern for the same word voice. As shown, it is often the case that the /f/ sound at the beginning of a word tends to be lost because of its energy being low. For this reason, a proper matching cannot be carried out, which would result in a lower degree of similarity to thereby lead to an erroneous recognition.