An in-vehicle voice intelligibility enhancement system is available. In the in-vehicle voice intelligibility enhancement system, voice output from a speaker (for example, navigation guidance voice and voice in which, for example, news or a mail is spoken) is made clearly audible even in a noisy environment. For example, in an in-vehicle navigation device, voice for, e.g., course guidance is output from a speaker to a passenger compartment. When noise such as an engine sound or road noise is large because, for example, a vehicle is driving, it becomes difficult to hear voice output from a speaker due to masking effect. Thus, when noise is large in relation to voice output from a speaker, for example, the gain of the entire voice band is increased by performing loudness compensation on the voice output from the speaker so as to make the voice output from the speaker clearly audible even in a noisy environment.
FIG. 8 is a block diagram of a known voice intelligibility enhancement system (see, for example, Japanese Unexamined Patent Application Publication No. 11-166835). Referring to FIG. 8, an identification filter 1 simulates a guidance voice signal at a position where a microphone 2 is disposed, and then a subtracter 3 subtracts the signal from the output of the microphone 2 to extract a noise signal. A loudness compensation gain calculation unit 4 calculates Gopt on the basis of each of the guidance voice signal and the noise signal and inputs Gopt to a route guidance (RG) compensation unit 5.
In this case, an identification process in an identification filter 6 is performed using an adaptive filter 7. An adaptive algorithm unit 8 in the adaptive filter 7 may be implemented using various types of adaptive algorithms. Typical adaptive algorithms include the Least Mean Squares (LMS) algorithm. Filter coefficients may be updated using, for example, the Fast-LMS algorithm (the LMS algorithm in the frequency domain).
The aforementioned known voice intelligibility enhancement system has many problems. A first problem is that, when an estimation error (deviation from an ideal state: α) occurs in a power of a voice signal, since a sign of the error of the estimated noise power calculated by the subtraction is opposite to the sign of the error α of the estimated power of the voice signal, as shown in the following equation:
[E1]Estimated Power of Voice Signal:{circumflex over (P)}S≈Σ(s(t)+α)2 Estimated Power of Noise:{circumflex over (P)}N≈Σ(n(t)−α)2  (1)the gain cannot be correctly determined because the error range becomes large.
Specifically, when an estimation error (deviation from an ideal state: α) occurs in a voice signal, an error of −α occurs in estimation of a noise. As a result, a gain value calculated from these power values deviates noticeably from an ideal value to affect the effect of compensation. For example, when both of the estimated powers of noise and voice are 70 dBA, an ideal compensation gain value is 5.9 dB. In this case, when the estimated value of the voice has an error of about 5 dB (resulting in 65 dBA), the estimated value of the noise increases to 75 dBA accordingly, so that the gain value increases to 9.9 dB. When the estimated value of the noise stays at 70 dBA, the gain value is 7.6 dB. Thus, the error of the gain is small.
A second problem is that an expensive digital signal processor (DSP) is necessary because the amount of calculation is too large in the known voice intelligibility enhancement system. In the case of the known voice intelligibility enhancement system, even when the adaptive algorithm unit 8 shown in FIG. 8 is implemented using the Fast-LMS algorithm in which the amount of calculation is said to be relatively small, a fast Fourier transform (FFT) needs to be performed twice for each unit signal block length, and filter coefficients need to be updated. Moreover, since this processing includes calculation of complex numbers, multiplication needs to be performed a little under 19000 times in a DSP, and thus a computational power of about 20 MIPS is necessary in a DSP. The amount of processing for the calculation occupies about eighty percent of the entire amount of processing in a DSP necessary to implement the known voice intelligibility enhancement system. Thus, a problem arises in that an expensive DSP is necessary to implement the known voice intelligibility enhancement system shown in FIG. 8, and accordingly, the computational power of the DSP cannot be sufficiently allocated to the other processing. In this case, assuming that window length N=1024 when an FFT is performed, the number of multiplication isN log2 N+N/2×17=18944.
Accordingly, it is an object of the present invention to enable correct estimation of noise power, in particular, even when an error occurs in an estimation of voice power, to prevent the error from affecting the estimation of noise power.
It is another object of the present invention to reduce the number of calculations performed in a voice intelligibility enhancement system.