1. Field of the Invention
The present invention relates to a speech recognition apparatus, and in particular, to a speech recognition apparatus for controlling various units or entering data to the units by voice. More specifically, it relates to a speech recognition apparatus which recognizes a voiced sound utilizing the frequency spectrum thereof as one of features of the voice.
2. Description of the Prior Art
As a conventional speech recognition apparatus, "Word Speech Recognition Apparatus" invented by Ken Nishimura in Japan, disclosed in the Japanese Patent Laying-Open Gazette No. 119198/1981, and "Microprocessor Implementation of an LPC-Based Isolated Word Recognizer" invented by John G. Ackenhusen Bell Laboratories, have been known. FIG. 1 is a schematic block diagram of a conventional speech recognition apparatus. In FIG. 1, a voice input portion 1 includes a microphone, an amplifier and a low-pass filter (not shown) for converting a voice into an electric signal and inputting the same. The output from the voice input portion 1 is fed to a feature extracting portion 2 as well as to a beginning and terminating end detection circuit 6. The feature extracting portion 2 analyzes the inputted aural signal to extract feature parameters of the voice, which in turn are fed to a recognition processing portion 5. The beginning and terminating end detection circuit 6 is adapted to detect beginning and terminating ends of a word speech. The result of detection by the beginning and terminating end detection circuit 6 is fed to the recognition processing portion 5, which comprises a microprocessor and a microcomputer etc., for performing recognition processing of the voice. The recognition processing portion 5 is connected to an input pattern memory 3 and a registration pattern memory 4.
Such a conventional speech recognition device or apparatus divides the voice waveform into frames of predetermined times, to extract the frequency spectrum per frame as a feature parameter. In a registration mode, the recognition processing portion 5 writes feature parameters of an extracted registration word or of a standard voice in the registration pattern memory 4. That is, the registration pattern memory 4 previously stores feature parameters of voices of a plurality of words. In a speech recognition mode, the recognition processing portion 5 writes the feature parameters of the extracted word speech in the input pattern memory 3, and sequentially calculates or evaluates similarity between the feature parameters stored in the input pattern memory 3 and those of the plurality of words stored in the registration pattern memory 4, thereby to recognize the word speech based on the results of the evaluation.
FIG. 2 is a circuit diagram showing the feature extracting portion 2 as shown in FIG. 1 in further detail. In FIG. 2, the aural signal from the voice input portion 1 is fed to bandpass filters 201-1, 201-2, . . . , 201-N, which are adapted to pass specific frequency components of the aural signal waveform. Outputs from the bandpass filters 201-1 to 201-N are, respectively, fed to smoothing circuits 202-1 to 202-N, outputs of which are, in turn, fed to an analog multiplexer 203. The analog multiplexer 203 functions as a circuit for passing the outputs from the respective smoothing circuits 202-1 to 202-N in a time sharing manner. The output from the analog multiplexer 203 is fed to an A-D (analog-to-digital) conversion circuit 204, to be converted into a digital data and outputted from the same.
FIG. 3 is an illustration showing frequency characteristics of the bandpass filters 201-1 to 201-N as shown in FIG. 2. As seen from FIG. 3, the bandpass filters 201-1 to 201-N are set to substantially uniformly extract all frequency components of the voice waveform by N units of filters. In this case, the features of the voice are expressed by large and small patterns of N units of values of the frequency components extracted by the N units of filters. The number N is generally selected from 8 to 16, and relatively satisfactory voice feature parameters can be obtained when no noise is mixed in the voice waveform. Thus, the recognition capacity of the conventional speech recognition device has been sufficiently satisfactory in this case. Typical filter center frequencies and bandwidths at the -3db points for the filters of FIG. 3 are given in Table 1 as follows:
TABLE 1 __________________________________________________________________________ CENTRAL FREQUENCY AND BAND WIDTH OF FILTER CHANNEL NUMBER 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 __________________________________________________________________________ CENTRAL FREQUENCY 300 450 600 750 900 1050 1250 1450 1650 1850 2050 2300 2550 2800 3050 3300 fo (Hz) BAND WIDTH Bo (Hz) 150 150 150 150 150 175 200 200 200 200 225 250 250 250 250 325 __________________________________________________________________________
However, when the voice to be recognized is mixed with noises such as those of a factory and other voices, the frequency components of the noises pass through the bandpass filters 201-201-N simultaneously with those of the subject voice, to influence the values of the feature parameters. When extraction accuracy of the feature parameters is evaluated by spectrum distortion, the feature parameters are significantly affected by the spectrum distortion of the inputted waveforms caused by the noises of the conventional recognition apparatus. Thus, recognition capacity of the conventional recognition apparatus is remarkably degradated when used in noisy circumstances.