During voice recognition or video production, the sound is obtained using a microphone and is converted into acoustic signals. The acoustic signals output from the microphone not only include voice signals representing the voice of a user but also include the background sound (noise), which is flowing in the background, in the form of noise signals. As the technology for suppressing noise signals from acoustic signals (input signals) that include a mix of voice signals and noise signals, the noise suppression technology is conventionally known.
Examples of the conventional noise suppression technology include the spectral subtraction method and the Wiener filtering method. The spectral subtraction method represents the noise suppression technology in which the average spectrum of non-voice sections is assumed to be the noise estimation value and the value obtained by subtracting the noise estimation value from the spectrum of input signals is set as the post-noise-suppression spectrum. The Wiener filtering method represents the noise suppression technology in which, from the ratio of the post-noise-suppression spectrum and the spectrum of input signals, a noise suppression coefficient to be used in suppressing the noise signals from the input signals is derived, and noise suppression signals are obtained by multiplying the input signals by the noise suppression coefficient.
However, in the conventional noise suppression technology, if there is a large error between the actual noise included in input signals and the noise estimation value or if there is a large variation in the noise suppression coefficients, sometimes the noise component gets excessively suppressed or sometimes the noise component does not get sufficiently suppressed. That is, in the conventional noise suppression technology, there are times when the output sound is deteriorated due to the generation of musical noise or due to unnaturalness of the sound.