1. Field of the Invention
This invention relates to a method and an apparatus of voice recognition or a method of feature extraction of voice used therein. More specifically, it relates to a method and apparatus through which a high recognition rate can be obtained by utilizing a pattern of a vector field.
2. Description of the Prior Art
Voice recognition, in general, involves a system in which standard voice patterns obtained by extracting the characteristics of words to be recognized are prepared for each word. The characteristic pattern extracted similarly from the voice inputted as the object of recognition and a plurality of standard patterns are matched to obtain the most similar standard pattern. The word which falls under this standard pattern is determined as being inputted. In the past, as the aforementioned characteristic pattern, the time-space pattern of a scalar field itself which is represented by a time axis as the abscissas and a space axis as the ordinates has been used. As such a time-space pattern of the scalar field, there are various time-space patterns such as cepstrum employing quefrency as the space axis, PARCOR coefficient, LSP coefficient and vocal tract area function. Spectrum employing a frequency as the space axis is typical.
As a problem to be solved in the field of voice recognition, there is a response to mass speakers or to a non-specified speaker, in which a number of standard patterns were prepared for one word to improve the recognition rate. In addition, DP matching which can absorb the variation of time axis has been developed to respond to the case where speaking rates of the same speaker might differ.
In the conventional apparatus employing the time-space pattern of the scalar field itself as the characteristic, when a large vocabulary or a non-specified speaker have to be recognized, the recognition rate was not always satisfactory. Even through a number of standard patterns are prepared for one word or DP Matching is used as described as above, a real solution could not be achieved. Thus, realization of the voice recognition system for the large vocabulary or a non-specified speaker is yet to be attained. Therefore, one of the present inventors has proposed, in Japanese Patent Application Laid-Open No. 60-59394, and "Comparison Studies on the Effectiveness Between the Vector Field of Spectrum and Spectrum for Speech Recognition" in The Transaction of the Institute of Electronics and Communication Engineers of Japan (D) vol. J69-D No. 11, P1704 (1986), to obtain the spectral vector field pattern by the space differentiation of the scalar field spectrum which is the time-space pattern of time-frequency for use as the features of voice.
In the past, research using the partial differentiation of the time-space point of spectrum as the feature was performed by T. B. Martin, and disclosed in "Practical Applications of Voice Input to Machines" Proc. IEEE, 64-4 (1976). However, T. B. Martin has calculated .differential.f(t,x)/.differential.t, .differential.f(t,x)/.differential.x from the time-space pattern f(t,x), thereby constituting the function which recognizes 32 different types of vocal sounds with respect to each frame and using the result expressed in 32 values of the linear matching in word units, differing from the apparatus in which the spectral vector field is produced from the spectral scalar field aforementioned.