1. Field of the Invention
The present invention relates to a sound signal processing method for emphasizing a target speech signal of an input sound signal and outputting an emphasized speech signal, and an apparatus for the same.
2. Description of the Related Art
When a speech recognition technology is used in an actual environment, ambient noise has a large influence to a speech recognition rate. There are many noises such as engine sound, wind noise, sound of an oncoming car and a passing car and sounds of a car audio device in a car. These noises are mixed in a voice of a speaker, and input to a speech recognition system thereby causing to decrease the recognition rate greatly. As a method for solving a problem of such a noise is considered the use of a microphone array. The microphone array subjects the input sound signals from a plurality of microphones to signal processing to emphasize a target speech signal which is a voice of a speaker and outputs the emphasized speech signal.
There is well known an adaptive microphone array to suppress noise by turning the null at which the receiving sound sensitivity of the microphone is low to an arrival direction of noise automatically. The adaptive microphone array is designed under a condition (restriction condition) that a signal in a target sound direction is not suppressed generally. As a result, it is possible to suppress noise from the side of the microphone array without suppressing the target speech signal coming from the front direction thereof.
However, there is a problem of so-called reverberation that in an actual environment, the voice of the speaker who is in front of the microphone array is reflected by obstacles surrounding the speaker such as walls, and the voice components coming from various directions enter to the microphone. The reverberation is not considered in the conventional adaptive microphone array. As a result, when the adaptive microphone array is employed under the reverberation, there is a problem to have a phenomenon as referred to as “target signal cancellation” that the target speech signal which should be emphasized is improperly suppressed.
There is proposed a method for making it possible to avoid the problem of the target signal cancellation if the influence of the reverberation is known, that is, the transfer function from a sound source to a microphone is known. For example, J. L. Flanagan, A. C. Surendran and E. E. Jan, “Spatially Selective Sound Capture for Speech and Audio Processing”, Speech Communication, 13, pp. 207-222, 1993 provides a method for filtering an input sound signal from a microphone with a matched filter provided by a transfer function expressed in a form of an impulse response. A. V. Oppenheim and R. W. Schafer, “Digital Signal Processing”, Prentice Hall, pp. 519-524, 1975 provides a method for reducing reverberation by converting an input sound signal into a cepstrum and suppressing a higher-order cepstrum.
The method of J. L. Flanagan et al. has to know an impulse response beforehand, so that it is necessary to measure an impulse response in the environment in which the system is actually used. Because there are many elements such as a passenger and a load, opening and closing of a window, which influence transfer functions in a car, it is difficult to implement a method that such an impulse response must be known beforehand.
On the other hand, A. V. Oppenheim et al. utilize the tendency that a reverberation component is apt to appear at a higher term of the cepstrum. However, because the direct wave and the reverberation component are not quantized in perfection, how the reverberation component which is harmful to the adaptive microphone array can be removed depends upon a situation of the system.
A room of a car is so small that the reflection component concentrates on a short time range. Then a direct sound and reflected sounds are mixed and change a spectrum greatly. Therefore, the method using the cepstrum cannot separate between the direct wave and the reverberation component enough, so that it is difficult to avoid the target signal cancellation due to influence of the reverberation.
The conventional art described above has a problem not to be able to remove enough the reverberation component leading to the target signal cancellation of the microphone array in the small space in a car.