In the following description, a signal emitted from a sound source is referred to as an audio signal, and an audio signal produced in a reverberant room and collected by a plurality of sound collecting means (microphones, for example) is referred to as an observation signal. The observation signal is the audio signal on which a reverberation signal is superimposed. It is difficult to extract characteristics of the original audio signal from the observation signal, and the resulting sound has a decreased clarity. A dereverberation processing removes the superimposed reverberation signal from the observation signal to facilitate extraction of the characteristics of the original audio signal and recover the sound clarity. This technique can be applied to various audio signal processing systems as a constituent technology to improve the entire performance of the system. Audio signal processing systems to which the dereverberation processing can be applied as a constituent technology to improve the performance include:
(1) a speech recognition system that uses the reverberation signal removal as a preprocessing;
(2) a communication system, such as a teleconference system, that uses the reverberation signal removal to improve the sound clarity;
(3) a playing system that removes a reverberation signal in recorded speech to improve the clarity of the recorded sound;
(4) a hearing aid that removes a reverberation signal to improve the listenability;
(5) a machine-controlled interface and a human-machine interactive system that issue a command to a machine in response to a human voice;
(6) a post-production system that improves the sound quality of acoustic contents containing reverberation signals recorded during production; and
(7) an acoustic effecter that performs an acoustic control of music contents by removing or adding a reverberation signal.
FIG. 1 shows an exemplary functional configuration of a conventional dereverberation apparatus 100 (referred to as a related art 1 hereinafter). The dereverberation apparatus 100 comprises an estimating section 104, a removing section 106, and a sound source model storage section 108. The sound source model storage section 108 stores a finite state machine model of a waveform in a short time period of an audio signal containing no reverberation signal and a sound source model that represents a characteristic of a waveform in each state as an autocorrelation function of the signal. In addition, using an operation to apply a dereverberation filter to an observation signal in the time domain and the sound source model described above, an optimization function that represents the likelihood of the signal resulting from removal of the reverberation signal from the observation signal (an ideal target signal) is previously defined. The optimization function has a dereverberation filter coefficients and a state time series of the sound source model as parameters and is designed to assume a larger value when more appropriate filter coefficient or state time series is given.
In the following description, input observations signals in the time domain are denoted by xt(1), . . . , xt(q), . . . , xt(Q). The subscript “t” represents a discrete time index, and the superscript “q” (q=1, . . . , Q) represents a sound collecting means index (a microphone index, for example). In the following, a microphone with an index q is referred to as a microphone for a q-th channel. This holds true for the following description.
When the observation signal xt(q) is input, the estimating section 104 estimates a dereverberation filter using the observation signal xt(q) and the optimization function described above. More specifically, the estimating section 104 estimates the dereverberation filter by determining a parameters that maximizes the value of the optimization function. The removing section 106 convolves the observation signal with the estimated dereverberation filter to remove the reverberation signal from the observation signal and outputs the resulting signal. The signal is referred to as a target signal.
FIG. 2 shows an exemplary functional configuration of a conventional dereverberation apparatus 200 (referred to as a related art 2 hereinafter). The dereverberation apparatus 200 comprises a dividing section 202 that divides an observation signal into U frequency bands, a storage section 204u (u=0, . . . , U−1) provided for each frequency band, a removing section 206u provided for each frequency band, and an integrating section 208.
The dividing section 202 divides the observation signal into subband signals for the U frequency bands. The resulting subband signals are time-domain signals. When the observation signal is divided into the subband signals, down-sampling (thinning out of the samples) may be performed. In the following description, a subband signal is denoted by x′n,u(q). In this expression, n represents a sample index after down-sampling, and u represents a frequency band index (u=0, . . . , U−1). In the following, a subband signal x′n,u(q) in a u-th frequency band of the observation signal xt(q) collected by a microphone for a q-th channel will be described.
As described above, the removing section 206u (u=0, . . . , U−1) and the storage section 204u are provided for each of the U frequency bands. The storage section 204u stores the dereverberation filter. By using a previously determined room transfer function from a sound source to each microphone, a coefficient of the dereverberation filter is previously determined on the basis of the least square error criterion so that the input/output function of the entire system, which is obtained by applying the room transfer function, the subband division processing by the dividing section 202, the dereverberation processing by the removing section 206u and the integration processing by the integrating section 208 in order, may be a unit impulse function as far as possible.
The removing section 206u removes the reverberation signal from the subband signal by convolving the subband signal x′n,u(q) with the dereverberation filter. The subband signal for each frequency band from which the reverberation signal is removed is referred to as a frequency-specific target signal s˜n,u. Then, the integrating section 208 integrates the frequency-specific target signals sn,u˜ (u=0, . . . , U−1) to determine a target signal st˜.
Details of the dereverberation apparatuses 100 and 200 are described in Non-Patent literatures 1, 2 and 3.    Non-Patent literature 1: T. Nakatani, B. H. Juang, T. Hikichi, T. Yoshioka, K. Kinoshita, M. Delcroix, and M. Miyoshi, “Study on speech dereverberation with autocorrelation codebook”, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-2007), vol. I, pp. 193-196, April 2007    Non-Patent literature 2: T. Nakatani, B. H. Juang, T. Yoshioka, K. Kinoshita, M. Miyoshi, “Importance of energy and spectral features in Gaussian source model for speech dereverberation”, WASPAA-2007, 2007    Non-Patent literature 3: N. D. Gaubitch, M. R. P. Thomas, P. A. Naylor, “Subband Method for Multichannel Least Squares Equalization of Room Transfer Functions,” Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-2007), pp. 14-17, 2007