The following account of the prior art relates to one of the areas of application of the present application, hearing aids.
Many state of the art hearing aids are equipped with a single-channel noise reduction (SC-NR) algorithm. In some modern hearing aids, the signal is represented internally as a time-frequency representation (which for multi-microphone hearing aids could be an output of a beamformer or directionality algorithm). A SC-NR algorithm applies a gain value to each time-frequency unit to reduce the noise level in the signal. The term ‘gain’ is in the present application used in a general sense to include amplification (gain >1) as well as attenuation (gain <1) as the case may be. In a noise reduction algorithm, however, the term ‘gain’ is typically related to ‘attenuation’. Specifically, a SC-NR algorithm estimates the signal-to-noise ratio (SNR) for each time-frequency coefficient and applies a gain value to each time-frequency unit based on this SNR estimate. Eventually, the noise-reduced (and possibly amplified and compressed) time-domain signal is reconstructed by passing the time-frequency representation of the noise-reduced signal through a synthesis filter bank.
When applying the gain to the time-frequency units, the SC-NR algorithm invariably introduces artifacts, because it bases its decisions on SNR estimates. The true SNR values are obviously not observable, since only the noisy signal is available. Some of these artifacts are known as “musical noise”, which are perceptually particularly annoying. It is well-known that the amount of “musical noise” can be reduced by limiting the maximum attenuation that the SC-NR is allowed to perform (cf. e.g. EP 2 463 856 A1), in other words by applying a ‘less aggressive’ noise reduction algorithm. The following tradeoff exists: 1) Larger maximum attenuation implies better noise reduction, but higher risk of introducing musical artifacts, and, on the other hand, 2) Lower maximum attenuation reduces the risk of musical artifacts but makes the noise reduction less effective. Therefore, an ideal maximum attenuation exists. However, the ideal maximum attenuation is dependent on input signal type, general SNR, frequency, etc. So, the ideal maximum attenuation is not fixed across time, but must be adapted to changing situations (as reflected in the input signal).
Recently, objective measures have been presented for estimating the amount of musical noise in a given noise-reduced signal, based on the noise-reduced signal itself, and the original noisy signal, the latter being the input to the SC-NR system (cf. e.g. [Uemura et al.; 2012], [Yu & Fingerscheidt; 2012] and [Uemura et al.; 2009]). More specifically, in [Uemura et al.; 2009] it is proposed to compare characteristics of the noisy unprocessed signal with signal characteristics of the noise-reduced signal to determine to which extent musical noise is present in the noise-reduced signal. It is found that the change (the ratio, in fact) of the signal kurtosis is a robust predictor of musical noise. Based on this measure, it is proposed in EP 2 144 233 A2 to adjust the parameters of the noise reduction algorithm (e.g., the maximum attenuation) to reduce the amount of musical noise (at the price of reduced noise reduction).
EP 2 144 233 A2 describes a noise suppression estimation device that calculates a noise index value, which varies according to kurtosis of a frequency distribution of magnitude of a sound signal before or after suppression of the noise component, the noise index value indicating a degree of occurrence of musical noise after suppression of the noise component in a frequency domain. A schematic block diagram reflecting such control of a noise reduction algorithm is shown in FIG. 1.
WO2008115445A1 deals with speech enhancement based on a psycho-acoustic model capable of preserving the fidelity of speech while sufficiently suppressing noise including the processing artifact known as “musical noise”.
WO2009043066A1 deals with a method for enhancing wide-band speech audio signals in the presence of background noise, specifically to low-latency single-channel noise reduction using sub-band processing based on masking properties of the human auditory system. WO0152242A1 deals with a multi-band spectral subtraction scheme comprising a multi-band filter architecture, noise and signal power detection, and gain function for noise reduction. WO9502288A1 deals with properties of human audio perception used to perform spectral and time masking to reduce perceived loudness of noise added to speech signals.