This invention relates to a method of speech recognition using short time self-correlation functions as feature parameter, and more particularly it relates to a method of speech recognition featuring preliminary selection.
It is customary in the art of speech recognition that speech-like sounds or words after detected and feature-extracted are compared with reference patterns of a large number of words as registered in advance and identified through the DP matching method or other methods. An attempt to make all of the large number of words the object of recognition and recognize them with high accuracy demands longer time and a special-purpose high speed hardware is necessary in shortening time for recognition. Alternatively, a simple and time-saving recognition method is employed so that the number of words sought to be recognized is limited. Any attempt in the prior art is still defective; the use of the special-purpose hardware makes a speech recognition system expensive; the use of the simple recognition method leads to a decline in recognition rate and limits of the words sought to be recognized limit the scope of applications of the speech recognition system.
It is also obvious that a so-called preliminary selection or pre-verification is carried out to limit the number of object words for recognition prior to execution of a recognition step using the DP matching method or the like.
Several ways of preliminary selection are well known. For instance, a method is reported by which feature parameters characteristic of the lengths of words and the spectra of the beginning and ending of words are used. This method, however, is not suited for simple type speech recognition systems because it involves complex operation and requires setup of those feature parameters for preliminary selection.
Another method of such preliminary selection is also reported by which approximately 10 samples are extracted in an interval from a time series of feature vectors for setup of pattern vectors of about 50 degrees and the number of object words is limited to 20% through verification by linear sampling. This method is not proper for simple type speech recognition systems either.