1. Field of the Invention
The present invention relates generally to audio coding techniques. More particularly, the present invention relates to noise detection for audio encoding.
2. Description of the Related Art
This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the claims in this application and is not admitted to be prior art by inclusion in this section.
Generally, in an audio encoding system, an incoming time domain audio signal is compressed such that the bitrate needed to represent the signal is significantly reduced. Ideally, the bitrate of the encoded signal fits to the constraints of the transmission channel or minimizes the size of the encoded file. Techniques for fitting bitrate to channel constraints are used in real-time communication and streaming services. Techniques for minimizing file size are used when storing audio content locally or via downloading at high audio quality.
Audio encoders aim to minimize perceptual distortion at a given bitrate while minimizing the encoded file size. Nevertheless, the lower the bitrate, the more challenging it is for the encoder to achieve these goals. In both cases, advanced encoding models and techniques are applied to maximize the end user experience. Typically, it is the encoding performance with the worst-case signals (signals that are difficult to encode) that ultimately defines the overall performance of any encoding system. Another important factor in defining overall performance of an encoding system is the encoding speed and the resources needed for a given bitrate or audio quality level that can be achieved. For commercial use and especially for mobile use, encoding speed and memory requirements play a significant role.
In an attempt to achieve even lower bitrates without reducing the perceptual distortion, new audio coding methods are being explored. Some conventional audio coding methods involve efficient coding of noise and noise-like signal segments. In such techniques, perceptual audio encoders encode the input signal in frequency domain, as human auditory properties can be best described in frequency domain. Spectral samples are typically quantized on a frequency band basis. The quantizer shapes the quantization noise by either increasing or decreasing the corresponding quantizer step size until the noise is just below the auditory masking threshold. On one hand, the introduced perceptual distortion is inaudible to the human ear but, on the other hand, this limits the lowest possible bitrate. It is well known that coding of high frequencies uses significant numbers of bits, but from perceptual point of view, it is the low frequencies that are more important.
Where a certain frequency band contains only white noise, the spectral samples within the band are still coded (with high bitrate) even though from an auditory point of view an exact representation of the spectral samples is not needed. It would be much more efficient to code the frequency band with a coding scheme optimized for noise or noise-like signal segments leaving more bits to the other frequency bands or, alternatively, lowering the lowest possible bitrate boundary.
One example of an audio coding system is the advanced audio coding (AAC) system. The AAC is a lossy data compression scheme intended for audio streams. AAC was designed to replace MP3 and is an extension of the MPEG-2 international standard, ISO/IEC 13818-3. It was further improved in MPEG-4, MPEG-4 Version 2 and MPEG-4 Version 3, ISO/IEC 14496-3.
AAC includes signaling methods for compact representation of noise and noise-like signal segments. However, AAC does not have a way to detect such signal segments. It is up to the implementer of the AAC encoder to decide how noise or noise-like signal segments should be detected or whether to detect such segments at all. Uncontrolled and false noise detection can actually result in severe quality degradation instead of quality improvement.
Attempts have been made to estimate and detect noise for perceptual audio coders, such as AAC coders. For example, a method using a predictor in the frequency domain on a frequency band basis is presented in: “Estimation of perceptual entropy using noise masking criteria,” Johnston, J. D.; Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on, 11-14 Apr. 1988; Pages: 2524-2527 vol. 5. Johnston describes calculating a tonality measure from the power spectrum, which is then used as a threshold to differentiate noise-like and tone-like signal segments. A method to use a predictor in time domain and noise detection in frequency domain is described in “Improving audio codecs by noise substitution, Schulz Donald; Journal of the Audio Engineering Society,” Vol. 44, No. 7/8, July/August 1996; Pages: 593-598. In this method, a predicted version of the input signal is first determined and noise detection is then made in frequency domain by comparing the original and predicted signals on a frequency band basis.
There is a need for noise detection techniques to be applied in various types of audio coding schemes. Further, there is a need for efficient estimation methods for detecting noise and noise-like signal segments. Even further, there is a need to reduce the bitrate of AAC encoded streams, which reduces the demand for bandwidth.