Due to the ubiquitous presence of noise in natural environments, real-world sound recordings typically contain noise from various sources. In order to improve the sound quality of sound recordings, a range of methods for reducing the noise level of sound recordings have been developed. Often, in such methods, a time-domain noise suppression filter is computed from a desired frequency response H(ω), and the time-domain noise suppression filter is then applied to the sound recording.
In an ideal noise suppression filter, the desired acoustic signal should pass through the filter undistorted, while noise should be completely attenuated. These properties cannot be simultaneously fulfilled in a real filter (except in the special case when there is no desired signal or no noise, or when the desired signal and noise are spectrally separated). Hence, in determining a desired frequency response 1/(o) of a filter, a trade-off between distorting the desired signal and distorting the noise has to be made for frequencies at which both the desired signal and noise are present.
The desired frequency response H(ω) can be estimated by means of various methods, such as spectral subtraction. In “Low-distortion spectral subtraction for speech enhancement”, Peter Händel, Conference Proceedings of Eurospeech, pp. 1549-1553, ISSN 1018-4074, 1995, different aspects of spectral subtraction methods for suppressing noise are discussed. In U.S. Pat. No. 5,706,395, spectral subtraction is discussed and a method of defining the level to which noise should be attenuated is disclosed. In U.S. Pat. No. 5,706,395, the desired frequency response H(ω) is clamped so that the attenuation cannot go below a minimum value, wherein the minimum value may, according to U.S. Pat. No. 5,706,395, depend on the signal-to-noise ratio of the noisy speech signal to be filtered. The clamping of the desired frequency response of U.S. Pat. No. 5,706,395 prevents a noise suppression filter from fluctuating around very small values, thus avoiding a noise distortion commonly referred to as musical noise.
In many spectral subtraction methods, the desired frequency response is calculated as a function of the signal-to-noise ratio (SNR). Since the SNR of a noisy acoustic signal at a particular frequency varies with time, the desired frequency response H(ω) is generally updated over time—often, the desired frequency response H(ω) is updated for each frame of data. An effect of this is that a noise, which is at a constant level in the noisy speech signal, is often attenuated to a level that varies considerably with time in a noticeable manner, resulting in fluctuations of the residual noise. This undesirable effect is often commonly referred to as noise pumping, and can be heard as a shadow voice.