The present invention relates to the field of processing audio signals, more specifically to an approach for estimating noise in an audio signal, for example in an audio signal to be encoded or in an audio signal that has been decoded. Embodiments describe a method for estimating noise in an audio signal, a noise estimator, an audio encoder, an audio decoder and a system for transmitting audio signals.
In the field of processing audio signals, for example for encoding audio signals or for processing decoded audio signals, there are situations where it is desired to estimate the noise. For example, PCT/EP2013/077525 (published as WO 2014/096279 A1) and PCT/EP2013/077527 (published as WO 2014/096280 A1), incorporated herein by reference, describe using a noise estimator, for example a minimum statistics noise estimator, to estimate the spectrum of the background noise in the frequency domain. The signal that is fed into the algorithm has been transformed blockwise into the frequency domain, for example by a Fast Fourier transformation (FFT) or any other suitable filterbank. The framing is usually identical to the framing of the codec, i.e., the transforms already existing in the codec can be reused, for example in an EVS (Enhanced Voice Services) encoder the FFT used for the preprocessing. For the purpose of the noise estimation, the power spectrum of the FFT is computed. The spectrum is grouped into psychoacoustically motivated bands and the power spectral bins within a band are accumulated to form an energy value per band. Finally, a set of energy values is achieved by this approach which is also often used for psychoacoustically processing the audio signal. Each band has its own noise estimation algorithm, i.e., in each frame the energy value of that frame is processed using the noise estimation algorithm which analyzes the signal over time and gives an estimated noise level for each band at any given frame.
The sample resolution used for high quality speech and audio signals may be 16 bits, i.e., the signal has a signal-to-noise-ratio (SNR) of 96 dB. Computing the power spectrum means transforming the signal into the frequency domain and calculating the square of each frequency bin. Due to the square function, this necessitates a dynamic range of 32 bits. The summing up of several power spectrum bins into bands necessitates additional headroom for the dynamic range because the energy distribution within the band is actually unknown. As a result, a dynamic range of more than 32 bits, typically around 40 bits, needs to be supported to run the noise estimator on a processor.
In devices processing audio signals which operate on the basis of energy received from an energy storage unit, like a battery, for example portable devices like mobile phones, for preserving energy a power efficient processing of the audio signals is essential for the battery lifetime. In accordance with known approaches, the processing of audio signals is performed by fixed point processors which, typically, support processing of data in a 16 or 32 bit fixed point format. The lowest complexity for the processing is achieved by processing 16 bit data, while processing 32 bit data already necessitates some overhead. Processing data with 40 bits dynamic range necessitates splitting the data into two, namely a mantissa and an exponent, both of which must be dealt with when modifying the data which, in turn, results in an even higher computational complexity and even higher storage demands.
Starting from the known technology discussed above, it is an object of the present invention to provide for an approach for estimating the noise in an audio signal in an efficient way using a fixed point processor for avoiding unnecessary computational overhead.