Digital encoding of audio signals has become increasingly important and is an essential part of many communication and distribution systems. For example, communication of speech and background audio in a cellular communication system is based on encoding of the audio at the source followed by the communication of the encoded audio data to the destination where this is decoded to recreate the source signal.
In general, there is a trade-off between the data rate (or file size) of an encoded signal and the quality that can be provided. In order to adapt the operation of an audio codec to the desired application, coding standards have been developed that provide different quality levels and data rates. In particular, coding standards have been proposed which encode audio in a base layer comprising encoded audio data corresponding to a low quality. Such a base layer may be supplemented by one or more enhancement layers that provide audio data which can be used together with the base layer audio data to generate an audio signal with improved audio quality. For example, when encoding the audio signal to generate the base layer, a residual signal representing the difference between the audio signal and the audio data of the base layer can be generated (typically by decoding the audio data of the base layer and subtracting this from input audio signal). This residual signal may then be further encoded to provide audio data for an enhancement layer. The process can be repeated to provide further enhancement layers.
An example of a layered audio encoding standard is the Embedded variable Bit Rate (EV-VBR) codec standardized as ITU-T Recommendation G.718 by the International Telecommunication Union, Telecommunication Standardization Sector, ITU-T.
G.718 is an embedded scalable speech and audio codec which provides high quality wideband (50 Hz to 7 kHz) speech at a range of bit rates. The codec is particularly suitable for Voice over Internet Protocol (VoIP) and includes functionality making it robust to frame erasures.
The ITU-T Recommendation G.718 codec uses a structure with a discrete layering for mono wideband, stereo wideband, superwideband mono and superwideband stereo layers. Currently the G.718 codec comprises five layers which are referred to as Layer 1 (the core or base layer) through to Layer 5 (the highest enhancement or extension layer) with combined bit rates of 8, 12, 16, 24, and 32 kbit/s. The lower two layers are based on ACELP (Algebraic Code Excited Linear Prediction Technology) with Layer 1 specifically employing a variation of the 3GPP2 VMR-WB (Variable Multi Rate—WideBand) speech coding standard comprising several coding modes optimized for different input signals. The coding error from Layer 1 is encoded in Layer 2, consisting of a modified adaptive codebook and an additional algebraic codebook. The error from Layer 2 is further coded for higher layers in the transform domain using the Modified Discrete Cosine Transform (MDCT). In order to improve the frame erasure concealment, as well as convergence and recovery after erased frames, a few supplementary concealment/recovery parameters are also determined and transmitted in Layer 3.
Layered audio coding provides increased flexibility and allows codecs to be modified to generate additional data for enhancement layers while still providing compatibility with legacy equipment. Furthermore, the layers facilitate the adaptation of the audio data to the specific conditions experienced. For example, when distributing audio data in a communication system, a network element may strip one or more enhancement layers in order to suit a data link with insufficient capacity to carry the whole audio data stream. For example, in a cellular communication system, the audio data may be transmitted over the air interface to a User Equipment (UE). During low load intervals, all data layers may be transmitted to the UE. However, during peak loading only a reduced communication resource may be available for the communication and accordingly the base station may strip one or more layers in order to enable communication using a reduced resource allocation. As a specific example, during low loading, a 32 kbit/s downlink channel may be allocated to the audio communication whereas only 16 kbit/s may be allocated at high loading. In the former case, all layers may be communicated and in the latter case only Layers 1, 2 and 3 will be communicated.
However, although such an approach may work well in many scenarios, it also has associated disadvantages. Specifically, it tends to result in an inflexible and suboptimal resource usage and/or a reduced perceived audio quality. Indeed, when the air interface resource availability is restricted, the perceived quality is continuously degraded.
Hence, an improved approach would be advantageous and in particular an approach allowing increased flexibility, reduced resource consumption, increased audio quality, facilitated implementation and/or improved performance would be advantageous.