The present invention relates to audio coding and, in particular, to switched audio coding, where, for different portions of an audio signal, the encoded signal is generated using different encoding algorithms.
Switched audio coders which determine different encoding algorithms for different portions of the audio signal are known. Generally, switched audio coders provide for switching between two different modes, i.e. algorithms, such as ACELP (Algebraic Code Excited Linear Prediction) and TCX (Transform Coded Excitation).
The LPD mode of MPEG USAC (MPEG Unified Speech Audio Coding) is based on the two different modes ACELP and TCX. ACELP provides better quality for speech-like and transient-like signals. TCX provides better quality for music-like and noise-like signals. The encoder decides which mode to use on a frame-by-frame basis. The decision made by the encoder is critical for the codec quality. A single wrong decision can produce a strong artifact, particularly at low-bitrates.
The most-straightforward approach for deciding which mode to use is a closed-loop mode selection, i.e. to perform a complete encoding/decoding of both modes, then compute a selection criteria (e.g. segmental SNR) for both modes based on the audio signal and the coded/decoded audio signals, and finally choose a mode based on the selection criteria. This approach generally produces a stable and robust decision. However, it also necessitates a significant amount of complexity, because both modes have to be run at each frame.
To reduce the complexity an alternative approach is the open-loop mode selection. Open-loop selection consists of not performing a complete encoding/decoding of both modes but instead choose one mode using a selection criteria computed with low-complexity. The worst-case complexity is then reduced by the complexity of the least-complex mode (usually TCX), minus the complexity needed to compute the selection criteria. The savings in complexity is usually significant, which makes this kind of approach attractive when the codec worst-case complexity is constrained.
The AMR-WB+ standard (defined in the International Standard 3GPP TS 26.290 V6.1.0 2004-12) includes an open-loop mode selection, used to decide between all combinations of ACELP/TCX20/TCX40/TCX80 in a 80 ms frame. It is described in Section 5.2.4 of 3GPP TS 26.290. It is also described in the conference paper “Low Complex Audio Encoding for Mobile, Multimedia, VTC 2006, Makinen et al.” and U.S. Pat. No. 7,747,430 B2 and U.S. Pat. No. 7,739,120 B2 going back to the author of this conference paper.
U.S. Pat. No. 7,747,430 B2 discloses an open-loop mode selection based on an analysis of long term prediction parameters. U.S. Pat. No. 7,739,120 B2 discloses an open-loop mode selection based on signal characteristics indicating the type of audio content in respective sections of an audio signal, wherein, if such a selection is not viable, the selection is further based on a statistical evaluation carried out for respectively neighboring sections.
The open-loop mode selection of AMR-WB+ can be described in two main steps. In the first main step, several features are calculated on the audio signal, such as standard deviation of energy levels, low-frequency/high-frequency energy relation, total energy, ISP (immittance spectral pair) distance, pitch lags and gains, spectral tilt. These features are then used to make a choice between ACELP and TCX, using a simple threshold-based classifier. If TCX is selected in the first main step, then the second main step decides between the possible combinations of TCX20/TCX40/TCX80 in a closed-loop manner.
WO 2012/110448 A1 discloses an approach for deciding between two encoding algorithms having different characteristics based on a transient detection result and a quality result of an audio signal. In addition, applying a hysteresis is disclosed, wherein the hysteresis relies on the selections made in the past, i.e. for the earlier portions of the audio signal.
In the conference paper “Low Complex Audio Encoding for Mobile, Multimedia, VTC 2006, Makinen et al.”, the closed-loop and open-loop mode selection of AMR-WB+ are compared. Subjective listening tests indicate that the open-loop mode selection performs significantly worse than the closed-loop mode selection. But it is also shown that the open-loop mode selection reduces the worst-case complexity by 40%.