To improve the quality of coded tonal signals especially at low bit-rates, modern audio transform coders employ very long transforms and/or long-term prediction or pre-/post-filtering. A long transform, however, implies a long algorithmic delay, which is undesirable for low-delay communication scenarios. Hence, predictors with very low delay based on the instantaneous fundamental pitch have gained popularity recently. The IETF (Internet Engineering Task Force) Opus codec utilizes pitch-adaptive pre- and postfiltering in its frequency-domain CELT (Constrained-Energy Lapped Transform) coding path [J. M. Valin, K. Vos, and T. Terriberry, “Definition of the Opus audio codec,” 2012, IETF RFC 6716. http://tools.ietf.org/html/rfc67161.], and the 3GPP (3rd Generation Partnership Project) EVS (Enhanced Voice Services) codec provides a long-term harmonic post-filter for perceptual improvement of transform-decoded signals [3GPP TS 26.443, “Codec for Enhanced Voice Services (EVS),” Release 12, December 2014.]. Both of these approaches operate in the time domain on the fully decoded signal waveform, making it difficult and/or computationally expensive to apply them frequency-selectively (both schemes only offer a simple low-pass filter for some frequency selectivity). A welcome alternative to time-domain long-term prediction (LTP) or pre-/post-filtering (PPF) is thus provided by frequency-domain prediction (FDP) like it is supported in MPEG-2 AAC [ISO/IEC 13818-7, “Information technology—Part 7: Advanced Audio Coding (AAC),” 2006.]. This method, although facilitating frequency selectivity, has its own disadvantages, as described hereafter.
The FDP method introduced above has two drawbacks over the other tools. First, the FDP method involves high computational complexity. In detail, linear predictive coding of at least order two (i.e. from the last two frame's channel transform bins) is applied onto hundreds of spectral bins for each frame and channel in the worst case of prediction in all scale factor bands [ISO/IEC 13818-7, “Information technology—Part 7: Advanced Audio Coding (AAC),” 2006.]. Second, the FDP method comprises a limited overall prediction gain. More precisely, the efficiency of the prediction is limited because noisy components between predictable harmonic, tonal spectral parts are subjected to the prediction as well, introducing errors as these noisy parts are typically not predictable.
The high complexity is due to the backward adaptivity of the predictors. This means that the prediction coefficients for each bin have to be calculated based on previously transmitted bins. Therefore, numerical inaccuracies between encoder and decoder can lead to reconstruction errors due to diverging prediction coefficients. To overcome this problem, bit exact identical adaptation has to be guaranteed. Furthermore, even if groups of predictors are disabled in certain frames, the adaptation has to be performed in order to keep the prediction coefficients up to date.