A dereverberation system is a system for suppressing reverberation superposed over desired voice signals. Reverberation is generated by convolution of an original signal with an impulse response from a signal source to an observation point. Therefore, in general, dereverberation is achieved by convoluting a signal over which reverberation is superposed, with an inverse property of the impulse response. Since the impulse response or its inverse property is unknown, however, the problem is how to determine it. One method for dereverberation by determining the inverse property of the impulse response and convoluting it with a signal containing reverberation is disclosed in Non-patent Document 1.
In the method disclosed, input voice containing reverberation is first subjected to linear prediction analysis to remove correlation between adjacent samples. Next, the correlation-removed signal is filtered with a filter whose coefficients are updated using a least mean squares (LMS) algorithm, etc. so that the kurtosis of the output thereof is maximized. The thus-obtained filter coefficients are employed as the inverse property of the impulse response to be convolved with the input voice containing reverberation, thus conducting dereverberation. While this method is originally applied to input signals at a plurality of different spatial positions, Non-patent Document 2 discloses application thereof to one input signal.
To compensate for deterioration in performance because of nonuse of signals at a plurality of spatially different positions, dereverberation is conducted in a two-stage configuration including first and second stages. At the first stage, an inverse property of an impulse response is determined by the method as disclosed in Non-patent Document 1, and it is convoluted with input voice containing reverberation to thereby suppress earlier reflection. Subsequently, at the second stage, later reflection is suppressed in a configuration similar to noise suppression. Specifically, a later reflection component contained in an output at the first stage is estimated, and it is subtracted from the output of the first stage to suppress the later reflection component. A block diagram of the method disclosed in Non-patent Document 2 is shown in FIG. 20. The dereverberation method disclosed in Non-patent Document 2 will be described hereinbelow with reference to FIG. 20.
A signal containing reverberation, i.e., degraded voice, supplied to an input terminal 1 is supplied to a linear prediction (LP) analyzing section 3 to remove correlation between adjacent samples. The resulting linear predicted error is transferred to an inverse filter 4 to determine convolution thereof with the filter coefficients, and the result is supplied to a coefficient updating section 5. The coefficient updating section 5 determines coefficient updating components using an LMS algorithm, etc. so that the kurtosis of the output of the inverse filter 4 is maximized. The coefficient updating components are fed back to the inverse filter 4, and are used to perform coefficient updating. By repeating such coefficient update, the property of the inverse filter 4 is ultimately equal to the inverse property of the impulse response from a signal source to an observation point. On the other hand, the property of the inverse filter 4 has been successively copied to the inverse filter 2, and convolution thereof with the degraded voice supplied to the input terminal 1 is calculated. The result of the convolution is the output of the aforementioned first stage. The coefficient update for the inverse filter 4 may be achieved using, in addition to the LMS algorithm, a normalized LMS (NLMS) algorithm, an LS algorithm, an affine projection algorithm, etc. Moreover, the inverse filter 4 and the coefficient updating section 5 may be configured using a frequency domain algorithm or a sub-band algorithm as disclosed in Non-patent Document 3.
At a second stage, a frame dividing section 6 divides the signal supplied from the inverse filter 2 at the first stage into frames each having a specific number of samples, and transfers them to a window processing section 7. The window processing section 7 multiplies the signal divided into frames by a window function, and transfers the result to a Fourier transform section 8. The window function used in the window processing has a property such that a frame edge is suppressed more than a frame center to allow smooth transition to an adjacent frame. The windowed signal is decomposed into a plurality of frequency components at the Fourier transform section 8, and further separated into a amplitude and a phase. The Fourier transform section 8 applies Fourier transform to the windowed signal to divide it into a plurality of frequency components, squares the amplitude value to obtain a power, and supplies it to a reverberation estimating section 111. The phase is supplied to an inverse Fourier transform section 15. The reverberation estimating section 111 uses a Rayleigh distribution function to estimate a current power of reverberation from the past power of degraded voice. The estimated power of reverberation is subtracted from the power of the windowed signal at a subtractor 141 to thereby achieve later reflection component removal. The result of the subtraction is transferred to a selecting section 121.
On the other hand, the power of the windowed signal is also supplied to a constant multiplier 20, where it is multiplied by a factor ε, and supplied to the selecting section 121. The selecting section 121 selects a larger one of the output of the subtractor 141 and that of the constant multiplier 20, and transmits the selected one to a silent gap decay section 19. The operation of the selecting section 121 thus causes the minimum value of the result of the subtraction to be limited to the windowed signal multiplied by ε, thus preventing excessive dereverberation. The silent gap decay section 19 detects a silent segment between voiced segments, and forcibly decays the power to a predetermined small value. This operation prevents a silent gap from being filled with reverberation. The output of the silent gap decay section 19 is supplied to the inverse Fourier transform section 15. The inverse Fourier transform section 15 combines a square root of the power of the dereverberated voice supplied from the silent gap decay section 19 and the phase of the reverberated voice supplied from the Fourier transform section 8, performs inverse Fourier transform thereon, and supplies the result to a frame synthesis section 17 as dereverberated voice signal samples. The frame synthesis section 17 uses the dereverberated voice samples in an adjacent frame to be synthesized with output voice samples in the current frame, and outputs the result to an output terminal 18.
Non-patent Document 1: IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 3701-3704 (May 2001).
Non-patent Document 2: IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 1085-1088 (March 2005).
Non-patent Document 3: IEEE Signal Processing Magazine, pp. 15-36 (January 1992).