In a sound recognition apparatus of the prior art, in order to increase the similarity between a user's input speech and a standard speech pattern of a recognition dictionary for same category, elements of various kinds of pattern transformations are excluded for the input speech and many kinds of pattern transformations are included in the standard pattern of the recognition dictionary.
As for the input sound, in order to exclude the pattern transformations speech based on noise, various kinds of methods are used. In a first method (the noise subtraction method), the frequency parameter of noise is estimated and the noise element is eliminated from the frequency parameter of the noisy speech. In a second method, frequency characteristics of the lines are approximated by a secondary curve in order to normalize differences of lines frequency characteristics, and the frequency characteristics are corrected for the input speech. In a third method for telephone speech recognition, in order to normalize distortion of the telephone line, the frequency characteristics are corrected using a filter, and the spectrum is flattered to eliminate the distortion of the telephone line.
On the other hand, as for the recognition dictionary, various methods are considered. In a first method, as for the contaminated noise, the speech pattern contaminated by the noise pattern is artificially generated, and the recognition dictionary is created using that speech pattern. In a second method, a plurality of recognition dictionaries are created by units of different signal to noise ratios (S/N). The S/N of the input speech is estimated and one recognition dictionary whose S/N is close to the S/N of the input speech is used. In a third method, the HMM (Hidden Markov Model) is used as the recognition dictionary. The HMM parameter of the contamineted noise speech is synthesized by the HMM parameter of the clear speech and the HMM parameter of the noise.
As for correction of the frequency characteristics of the microphone for the input speech, one method is disclosed in Japanese Patent Disclosure (Kokai) PH7-84594. In this method, a first microphone to input the speech to be recognized, and a second microphone to gather speech data to create the recognition dictionary are prepared. The sound of the user's voice is inputted through the first microphone and the second microphone at same time. The coefficient of the adaptive filter is estimated using respective speech data so that the characteristic of the first microphone is equal to the characteristic of the second microphone. In case of actual recognition, the input speech is corrected by an estimated filter.
As for correction of microphone characteristics and circuit characteristics for the input speech, the frequency characteristics are calculated beforehand by a test signal such as white noise. In case of actual recognition, the frequency parameter of the input speech is changed by correction data of the frequency characteristics. Alternatively, the frequency characteristics are estimated from the actual input speech by an approximation method, and the input speech is corrected using estimated characteristics.
However, in the case where the correction data is calculated beforehand, a microphone used for inputting a user's voice on recognition mode is previously determined. Therefore, if the user selects his desired microphone according to a particular use environment, the frequency characteristics will not be suitably corrected. In case the frequency characteristics are estimated from the input speech, the characteristics are not estimated exactly, and corrected only approximately. Even if the input speech is corrected, it is different from the microphone characteristics used for creating the recognition dictionary and high recognition accuracy is not obtained.
Furthermore, in the method disclosed in PH7-84594 mentioned-above, in case of correcting the input speech, the second microphone used to create the recognition dictionary is necessarily used. However, in general, the second microphone used to gather speech data for creating the recognition dictionary is expensive or differs for each task. Therefore, in order to correct the frequency characteristics of microphone suitably, the user's expense increases.