1. Field of the Invention
The present invention relates to a speech recognition apparatus and method. More specifically, the present invention relates to a method of modifying feature parameters of an input speech for a speech recognition.
2. Description of the Related Art
Speech recognition apparatuses which recognize an input speech are well known. A typical speech recognition apparatus are usually mounted in a car, or installed in a telephone. Therefore, it works in such various environments. So the speech recognition apparatuses are needed to work with a good performance in such various environments.
In such various environments, a speech produced by a speaker is distorted while it is transferred to an input device of the speech recognition apparatus, because of a variety of sound fields characteristics or fluctuation of telephone line characteristics
On the other hand, a transfer function corresponding to registered patterns in a dictionary for the speech recognition is effective in an ideal environment in which the dictionary is made. But, it is not so effective in a real environment. Therefore, the accuracy of the speech recognition is degrading because of a mismatch between the transfer function in a real environment and the transfer function in an ideal environment.
It is therefore an object of the present invention to provide a speech recognition method and apparatus, which can recognize a speech in a real environment without great degradation of an accuracy of the speech recognition.
The above object can be achieved by a first method of modifying feature parameter for the speech recognition according to one aspect of the present invention. This invention is provided with: a process of extracting the feature parameter from an input speech in a real environment; a process of reading a first speech transfer characteristic corresponding to an environment in which a reference pattern for the speech recognition is generated, from a first memory device; a process of reading a second speech transfer characteristic corresponding to the real environment from a second memory device; and a process of modifying the extracted feature parameter according to the first speech transfer characteristic and the second speech transfer characteristic to convert the extracted feature parameter corresponding to the real environment into a modified feature parameter corresponding to the environment in which the reference pattern is generated.
According to the method of modifying feature parameter for the speech recognition of the present invention, a speech spoken by a speaker is inputted into a speech recognition apparatus, and a feature parameter is extracted. This extracted feature parameter, which includes distortion resulted from a speech transfer characteristic, are modified according to a first speech transfer characteristic, which corresponds to an environment in which the reference pattern for the speech recognition is generated, and a second speech transfer characteristic, which corresponds to a real environment. Then, a modified feature parameter, which corresponds to the environment in which the reference pattern is made, is generated. Therefore, distortion of the feature parameter caused by a real environment can be removed, and modified feature parameter, which can contribute to an improvement of an accuracy of the speech recognition, are derived.
According to another aspect of the present invention, in the above-stated first method, the feature parameter may be expressed in a frequency domain, the first speech transfer characteristic and the second speech transfer characteristic may be a first transfer function and a second transfer function in the frequency domain respectively, and the modifying process may modify the extracted feature parameter according to a formula: Fxc2x7C1/C2 where the F is the extracted feature parameter, the C1 is the first transfer function, and the C2 is the second transfer function.
In this aspect, the feature parameters as a target of the modifying process are expressed in a frequency domain, and the first speech transfer characteristic and the second speech transfer characteristic are expressed as transfer functions in the frequency domain respectively. The modification of the extracted feature parameter is performed with a multiplication and a division using these two transfer functions. Therefore, an influence of the real environment can be removed, and a processing time can be reduced compared with using the feature parameter expressed in a time domain.
According to further aspect of the present invention, in the above-stated first method, the feature parameter may be expressed in a cepstrum domain, the first speech transfer characteristic and the second speech transfer characteristic may be a first transfer function and a second transfer function in the cepstrum domain respectively, and the modifying process may modify the extracted feature parameter according to a formula: F+C1xe2x88x92C2, where the F is the extracted feature parameter, the C1 is the first transfer function, and the C2 is the second transfer function.
In this aspect, the feature parameter as a target of the modifying process is expressed in a cepstrum domain, and the first speech transfer characteristic and the second speech transfer characteristic are expressed as transfer functions in the cepstrum domain respectively. The modification of the extracted feature parameter is performed with an addition and a subtraction using these two transfer functions. Therefore, an influence of the real environment can be removed, and a processing time can be reduced compared with using the feature parameter expressed in a time domain or a frequency domain.
The above object can be also achieved by a second method of speech recognition according to further aspect of the present invention. This invention is provided with: a process of extracting a feature parameter from an input speech in a real environment; a process of reading a first speech transfer characteristic corresponding to an environment in which a reference pattern for the speech recognition is generated, from a first memory device; a process of reading a second speech transfer characteristic corresponding to the real environment from a second memory device; a process of modifying the extracted feature parameter according to the first speech transfer characteristic and the second speech transfer characteristic to convert the extracted feature parameter corresponding to the real environment into a modified feature parameter corresponding to the environment in which the reference pattern is generated ; and a process of calculating an output probability using the modified feature parameter and the reference pattern; a process of recognizing the input speech using the calculated output probability.
According to this method of speech recognition of the present invention, a speech spoken by a speaker is inputted into a speech recognition apparatus, and, feature parameter is extracted. This extracted feature parameter, which includes distortion resulted from a speech transfer characteristic, is modified according to a first speech transfer characteristic, which corresponds to an environment in which the reference pattern is generated, and a second speech transfer characteristic, which corresponds to the real environment. Then, a modified feature parameter, which correspond to the environment in which the reference pattern is made, is generated. And the input speech is recognized on the basis of the calculation using the modified feature parameter and the reference pattern. Therefore, distortion of the extracted feature parameter caused by the real environment can be removed, and an accuracy of the speech recognition can be improved.
According to further aspect of the present invention, in the above-stated second method, the feature parameter may be expressed in a frequency domain, the first speech transfer characteristic and the second speech transfer characteristic may be a first transfer function and a second transfer function in the frequency domain respectively, and the modifying process may modify the extracted feature parameter according to a formula: Fxc2x7C1/C2 where the F is the extracted feature parameter, the C1 is the first transfer function, and the C2 is the second transfer function.
In this aspect, the feature parameter as a target of the modifying process is expressed in a frequency domain, and the first speech transfer characteristic and the second speech transfer characteristic are expressed as transfer functions in the frequency domain respectively. The modification of the extracted feature parameter is performed with a multiplication and a division using these two transfer functions. And the output probability is calculated, and then, the input speech is recognized. Therefore, distortion caused by an influence of the real environment can be removed, and an accuracy of the speech recognition can be improved with simple and rapid processes.
According to further aspect of the present invention, in the above-stated second method, the feature parameter may be expressed in a cepstrum domain, the first speech transfer characteristic and the second speech transfer characteristic may be a first transfer function and a second transfer function in the cepstrum domain respectively, and the modifying process may modify the extracted feature parameter according to a formula: F+C1xe2x88x92C2, where the F is the extracted feature parameter, the C1 is the first transfer function, and the C2 is the second transfer function.
In this aspect, the feature parameter as a target of the modifying process is expressed in a cepstrum domain, and the first speech transfer characteristic and the second speech transfer characteristic are expressed as transfer functions in the cepstrum domain respectively. The modification of the extracted feature parameter is performed with an addition and a subtraction using these two transfer functions. And the output probability is calculated, and then, the input speech is recognized. Therefore, distortion of an influence of the real environment can be removed, and an accuracy of the speech recognition can be further improved with simple and rapid processes.
The above object can be also achieved by a speech recognition apparatus according to the present invention. This invention is provided with: an extracting device for extracting a feature parameter from an input speech in a real environment; a first memory device for storing a first speech transfer characteristic corresponding to an environment in which a reference pattern for the speech recognition is generated; a second memory device for storing a second speech transfer characteristic corresponding to the real environment; a modifying device for modifying the extracted feature parameter according to the first speech transfer characteristic and the second speech transfer characteristic to convert the extracted feature parameter corresponding to the real environment into a modified feature parameter corresponding to the environment in which the reference pattern is generated; a calculating device for calculating an output probability using the modified feature parameter and the reference pattern; and a recognizing device for recognizing the input speech using the calculated output probability.
By this apparatus, distortion of the feature parameter caused by the real environment can be removed, and an accuracy of the speech recognition apparatus can be improved in various environments.
According to further aspect of the present invention, in the above-stated apparatus, the feature parameter may be expressed in a frequency domain, the first speech transfer characteristic and the second speech transfer characteristic may be a first transfer function and a second transfer function in the frequency domain respectively, and the modifying process may modify the extracted feature parameter according to a formula: Fxc2x7C1/C2 where the F is the extracted feature parameter, the C1 is the first transfer function, and the C2 is the second transfer function.
In the apparatus according to this aspect, distortion caused by an influence of the real environment can be removed, and therefore, an accuracy of the speech recognition apparatus can be improved in various environments with simple and rapid processes.
According to further aspect of the present invention, in the above-stated apparatus, the feature parameter may be expressed in a cepstrum domain, the first speech transfer characteristic and the second speech transfer characteristic may be a first transfer function and a second transfer function in the cepstrum domain respectively, and the modifying process may modify the extracted feature parameter according to a formula: F+C1xe2x88x92C2, where the F is the extracted feature parameter, the C2 is the first transfer function, and the C2 is the second transfer function.
In this apparatus, accordingly to this aspect, distortion of an influence of the real environment can be removed, and therefore, an accuracy of the speech recognition apparatus can be further improved in various environments with simple and rapid processes.