1. Field of the Invention
This invention relates to a speech recognition apparatus for extracting parameters from an input speech to recognize the speech.
2. Description of the Related Art
In the general speech recognition method, a method of recognizing the speech by extracting low order parameters or energy components and high order parameters or frequency components from the input speech and correctly recognizing the speech by use of both of the extracted parameters is dominant. This method is to simultaneously effect the low-order parameter analyzation and high-order parameter analyzation for the input speech on the real time basis. However, since it is necessary to analyze the DFT spectrum, filter bank, LPC parameters or the like at every 8 milliseconds on the real time basis in the high-order parameter analyzation, a specially high-speed signal processing hardware such as a DSP (digital signal processor) is required.
For example, according to the conventional acoustic recognition system disclosed in Acoustic Institution, March, 1982, "RECOGNITION OF TELEPHONE SPEECH BY HYBRID STRUCTURE MATCHING METHOD" ASADA et al., the low-order parameter analyzation and high-order parameter analyzation are first effected in parallel for the input speech (output of a speech input section). Then, the start and end points of word boundaries of the input speech are detected according to the result of the low-order parameter analyzation and sampling frame is determined according to a predetermined number of frames based on a range of the input speech corresponding to the detected start and end points. After this, speech feature spectrum of fixed orders is extracted from the result of high-order parameter analyzation (which is previously derived) corresponding to the range of the input speech according to the sample frame number and matched with standard patterns registered in a word dictionary and similarities therebetween are derived. The result of recognition is output according to the similarities.
In the above conventional speech recognition method, the high-order parameter analyzation for all of the input speech is first effected and then word feature spectrum is extracted by use of only the necessary high order parameters determined according to the high-order parameter analyzation. That is, since only several frames included in the result of the high-order parameter analyzation for the entire input speech are used, a large portion of the specially calculated high order parameters will become useless. In other words, calculations unnecessary for the speech recognition have been effected.
As described above, in the conventional speech recognition method, a special hardware for effecting the high-order parameter analyzation for the entire input speech is necessary and a large portion of specially calculated high order parameters becomes useless.