Signal enhancement technologies for enhancing a source signal contained in an observed signal in which additive distortion and multiplicative distortion are superimposed on the source signal reduce the additive distortion or multiplicative distortion. First, a general signal enhancement technology for a speech signal will be described. In this case, the additive distortion corresponds to noise in a room while the multiplicative distortion corresponds to reverberation.
FIG. 1 is a block diagram showing the general structure of a signal enhancement device.
First, a time-domain waveform signal of observed sound is obtained by using a sensor such as a microphone, by loading it from an audio file, or by using other ways. Then, it is sampled, quantized, and input to a subband decomposition unit. The time-domain observed signal is divided into narrow-band signals of different frequency bands by the subband decomposition unit. This means that the time-domain observed signal is converted to a time-frequency-domain observed signal. A set of the observed signals divided into the frequency bands will be hereafter referred to as a complex spectrogram of the observed signal. The subband decomposition unit realizes this process by using conventional technologies, such as a short time Fourier transform and a polyphase filter bank. There is also a source signal enhancement method that directly uses the time-domain observed signal without dividing the signal into frequency bands. This specification assumes the time-frequency-domain if the domain of the signal is not explicitly indicated.
A parameter estimation unit then estimates some parameters characterizing the observed signal from the complex spectrogram of the observed signal. The parameters may be parameters of an all pole model characterizing power spectra of a source signal or noise, regression coefficients of an autoregressive model characterizing a room transfer system, and so on.
A source signal estimation unit calculates an estimate of the complex spectrogram of the source signal by using the complex spectrogram of the observed signal and the estimated parameter values. Then, a subband synthesis unit generates an estimate of the time-domain source signal based on the estimated complex spectrogram of the source signal. The way of processing for the subband synthesis unit is chosen according to the way of processing for the subband decomposition unit. If the subband decomposition unit executes a short time Fourier transform, the subband synthesis unit performs an overlap add technique. If the subband decomposition unit executes polyphase filter bank analysis, the subband synthesis unit performs polyphase filter bank synthesis. If the subband decomposition unit is omitted, the subband synthesis unit is also omitted.
The conventional speech signal enhancement technologies can be divided roughly into two categories: One is designed for an environment where a source signal and noise are present (refer to non-patent literature 1, for example); the other is designed for an environment where a source signal and reverberation are present (refer to non-patent literature 2, for example). The former reduces noise contained in an observed signal in which the noise is imposed on the source signal. The latter reduces reverberation contained in an observed signal in which the reverberation is imposed on the source signal. Next, the speech signal enhancement technologies proposed in non-patent literature 1 and 2 will be described. Symbols such as ^ and ˜ used in the text given below should be typed above a letter but are typed immediately after the letter because of the limitations of text notation.
<Noise Reduction Technology in Non-Patent Literature 1>
Non-patent literature 1 describes a noise reduction technology for reducing noise contained in an observed signal in which the noise is imposed on a source signal. The ways of processing in each unit disclosed in non-patent literature 1 will be described below.
The subband decomposition unit in non-patent literature 1 divides the observed signal into narrow-band signals of different frequency bands using a short time Fourier transform. The parameter estimation unit in non-patent literature 1 estimates source parameters sΘ of an all pole model of the source signal and noise parameters dΘ of a noise model, where these parameters are chosen as the parameters characterizing the observed signal in which the noise is superimposed onto the source signal.
In the example described in non-patent literature 1, true values dΘ˜ of the noise parameters are calculated by using the observed signal in a time segment where the source signal is supposed to be absent (step S101). Initial values sΘ^(0) of the source parameter estimates are specified (step S102). An index i indicating an iteration count is set to 0 (step S103).
Both the source parameter estimates sΘ^(i) and the true values dΘ˜ of the noise parameters are then used to calculate a posterior distribution p(S|Y, sΘ^(i), dΘ˜) of a complex spectrogram S of the source signal conditioned on the source parameter estimates sΘ^(i), the true values dΘ˜ of the noise parameters, and the complex spectrogram Y of the observed signal (step S104). Then, the conditional posterior distribution p(S|Y, sΘ^(i), dΘ˜) is used to update the source parameter estimates from sΘ^(i) to sΘ^(i+1) (step S105). Until a predetermined termination condition is satisfied (step S106), steps S104 and S105 are iteratively performed while incrementing the i value by 1 in each iteration (step S107). The source parameter estimates sΘ^(i+1) obtained when the predetermined termination condition is satisfied are output as final estimates sΘ^ of the source parameters (step S108).
The source signal estimation unit then obtains an estimate of the complex spectrogram of the source signal by using the parameters dΘ˜ and sΘ^ estimated by the parameter estimation unit and a Wiener filter. The subband synthesis unit converts the estimate of the complex spectrogram to the estimate of the time-domain source signal by using an overlap add technique.
<Reverberation Reduction Technology in Non-Patent Literature 2>
Non-Patent Literature 2 describes a reverberation reduction technology for reducing reverberation contained in an observed signal in which the reverberation is imposed on the source signal. The ways of processing in each unit disclosed in non-patent literature 2 will be described below.
In the reverberation reduction technology disclosed in non-patent literature 2, subband decomposition is not performed. The parameter estimation unit and the source signal estimation unit in non-patent literature 2 process the time-domain observed signal directly. The parameter estimation unit estimates source parameters sΘ and reverberation parameters gΘ, where these parameters are chosen as the parameters characterizing the observed signal, in which the reverberation is imposed on the source signal. The reverberation parameters in non-patent literature 2 are regression coefficients of a linear filter for calculating the reverberation imposed on the source signal. The linear filter is applied to the time-domain observed signal in which only the reverberation is superimposed onto the source signal.
In the example described in non-patent literature 2, initial values) gΘ^(0) of the reverberation parameter estimates are specified (step S111). An index i indicating an iteration count is set to 0 (step S112).
By using the reverberation parameter estimates gΘ^(0), the source parameter estimates are updated to sΘ^(i+1) (step S113). Then, by using the updated source parameter estimates sΘ^(i+1), the reverberation parameter estimates are updated to gΘ^(i+1) (step S114). Until a predetermined termination condition is satisfied (step S115), steps S113 and S114 are iteratively performed while incrematin the i value by 1 in each iteration (step S116). The source parameter estimates sΘ˜(i+1) obtained when the predetermined termination condition is satisfied are considered to be final estimates sΘ^ of the source parameters. The reverberation parameter estimates gΘ^(i+1) are output as the final estimate gΘ^ of the reverberation parameters (step S117).
Then, the source signal estimation unit estimates the reverberation contained in the observed signal by convolving the observed signal with a linear filter generated by using the final estimates gΘ^ of the reverberation parameters calculated by the parameter estimation unit and subtracts it from the observed signal. By doing this, the source signal estimation unit calculates and outputs a dereverberated signal.
Non-patent literature 1: Lim, J. S. and Oppenheim, A. V., “All pole modeling of degraded speech,” IEEE Trans. Acoust. Speech, Signal Process., Vol. 26, No. 3, pp. 197-210 (1978).
Non-patent literature 2: Yoshida, T., Hikichi, T. and Miyoshi, M., “Dereverberation by Using Time-Variant Nature of Speech Production System,” EURASIP J. Advances in Signal Process, Vol. 2007 (2007), Article ID 65698, 15 pages, doi:10.1155/2007/65698.