This invention relates to a pattern matching apparatus adapted to compare patterns, which are expressed as sequences of feature vectors, such as speech patterns.
A pattern matching method is widely used in a pattern recognition system in order to recognize an unknown pattern. In pattern recognition systems, pattern matching methods such as word level matching, and phoneme or syllable level matching has been proposed. With both of these methods, a time normalization matching method (a so-called DP matching method) utilizing dynamic programming for dealing with variations in speech speed, or the so-called two-step DP matching method proposed to prevent segmentation errors in continuously spoken word patterns and phoneme patterns have been used widely. These methods are described in detail in U.S. Pat. No. 3,816,722 and U.S. Pat. No. 4,049,913.
The number of reference patterns to be prepared in advance in connection with continuous speech recognition carried out with phoneme or syllable level matching may be extremely small as compared with that carried out by the word level matching. This allows a reduction in not only the user's reference pattern-registering load but also in the required memory capacity and in the required amount of computation. However, the recognition rate for the matching method which uses the phoneme level is low due to the influence of coarticulation.
In order to deal with such problems, it is useful that a consonant-vowel (which will be hereinafter referred to as "CV") combination and a vowel-consonant-vowel (which will be hereinafter referred to as "VCV") combination, which contain not only the ordinary information on phoneme level but also the information on the phoneme transition, be set as a recognition unit.
The following methods can generally be thought of as methods of preparing reference patterns.
(a) CV patterns are prepared. In this method, the number of reference patterns may be small. However, since reference patterns describing the V-to-C transition are not prepared, the recognition rate may decrease. Furthermore, it is difficult to determine a segmentation point because of the difficulty in finding a starting point of the consonant in continuous speech.
(c) VCV patterns are prepared. In this method, the reference pattern include both the V-to-C transition and C-to-V transition, so that attaining a high recognition rate can be expected. However, since there are a very large number of types of VCV patterns, the user's reference pattern-registering load, memory capacity and a processing requirements increase. A boundary between matching sections consists of a vowel, therefore, a matching section (segmentation points) can be easily determined because a vowel portion can be easily found. (c) CV patterns and VC patterns are prepared. In this method, a recognition rate as high as that in method (b) can be expected. Moreover, the number of types of reference patterns may be smaller than that used in method (b). However, it is not easy to cut out a section corresponding to VC.