Audio signals, like speech or music, are encoded for example to enable efficient transmission or storage of the audio signals.
Audio encoders and decoders (also known as codecs) are used to represent audio based signals, such as music and ambient sounds (which in speech coding terms can be called background noise). These types of coders typically do not utilise a speech model for the coding process, rather they use processes for representing all types of audio signals, including speech. Speech encoders and decoders (codecs) can be considered to be audio codecs which are optimised for speech signals, and can operate at either a fixed or variable bit rate.
An audio codec can also be configured to operate with varying bit rates. At lower bit rates, such an audio codec may be optimized to work with speech signals at a coding rate equivalent to a pure speech codec. At higher bit rates, the audio codec may code any signal including music, background noise and speech, with higher quality and performance. A variable-rate audio codec can also implement an embedded scalable coding structure and bitstream, where additional bits (a specific amount of bits is often referred to as a layer) improve the coding upon lower rates, and where the bitstream of a higher rate may be truncated to obtain the bitstream of a lower rate coding. Such an audio codec may utilize a codec designed purely for speech signals as the core layer or lowest bit rate coding.
An audio codec is designed to maintain a high (perceptual) quality while improving the compression ratio. Thus instead of waveform matching coding it is common to employ various parametric schemes to lower the bit rate. For multichannel audio, such as stereo signals, it is common to use a larger amount of the available bit rate on a mono channel representation and encode the stereo or multichannel information exploiting a parametric approach which uses relatively few bits.
Current speech and audio standardization efforts at the 3rd Generation Partnership Project (3GPP) aim to increase the quality of the encoded signal through coding efficiency, bandwidth, as well as number of channels. A stereo/binaural extension is being prepared for the Enhanced Voice Services (EVS) speech and audio codec candidate. The coding efficiency for this proposal is of importance, especially for lower codec bitrates. As the addition of a large bitrate extension would diminish the benefits of having an extension, if the total bitrate equals or overpasses the bitrate of a dual mode.
The proposed stereo/binaural extension is composed of encoded stereo parameters. Increasing the coding efficiency for these parameters means reducing the bitrate of the extension and using the ‘saved’ bits for better encoding of the mono downmix. This is particularly useful at low bit rates where the quality of the encoded downmix is more sensitive to the bitrate.
In addressing the coding efficiency of the stereo parameters a significant saving of bits may be made. Coding efficiency of stereo parameters has involved quantization of the values (levels), followed by entropy encoding to reduce further the bitrate. A previously proposed method for encoding the stereo parameters disclosed in EP2856776 uses an adaptive version of the Golomb Rice coding.