Audio signals, like speech or music, are encoded for example to enable efficient transmission or storage of the audio signals. A high compression ratio enables the storage of the data with the same storage capacity or transmitting the signal more efficiently through a communication channel, which in turn can provide the service for more simultaneous users. On the other hand, a high compression ratio may lead to perceived degradation of the compressed audio. The target of audio coding is in general thus to maximize the audio quality at a given compression ratio, or to maintain a given audio quality with as good a compression ratio as possible.
Audio encoders and decoders are used to represent audio based signals, such as music and ambient sounds (which in speech coding terms can be called background noise). These types of coders typically do not utilise a speech model for the coding process, rather they use processes for representing all types of audio signals, including speech.
Speech encoders and decoders (codecs) are usually optimised for speech signals, and can operate at either a fixed or variable bit rate.
An audio codec can also be configured to operate with varying bit rates. At lower bit rates, such an audio codec may work with speech signals at a coding rate equivalent to a pure speech codec. At higher bit rates, the audio codec may code any signal including music, background noise and speech, with higher quality and performance.
In some audio codecs the input signal is divided into a limited number of bands. Furthermore some codecs use the correlation between the low and high frequency bands or regions of an audio signal to improve the coding efficiency of the codecs.
As typically the higher frequency bands of the spectrum are generally quite similar to the lower frequency bands some codecs encode only the lower frequency bands and reproduce the upper frequency bands as a scaled lower frequency band copy. Thus by only using a small amount of additional control information considerable savings can be achieved in the total bit rate of the codec.
For example, if we divide a full-band (20-kHz bandwidth) audio signal equally into two frequency regions, it is often the case that the higher band is quite similar to the lower band. Since the higher frequencies are not generally as perceptually sensitive to coding errors (introduced by the compression) as the low-frequency part of the signal, a lower bit rate (and a higher compression ratio) can be used for the high-frequency content than the corresponding low-frequency content. In addition, the high-frequency coding can be at least partially based on the low-frequency coding. This gives rise to so-called bandwidth extension methods, which are commonly employed in modern, low-rate audio coding.
New speech and audio coding for the next generation telecommunication systems are in development or planning and have been currently referred to as EVS (Enhanced Voice Service) codec for EPS (Evolved Packet System) or LTE (Long Term Evolution) telecommunication systems. The EVS codec is envisioned to provide several different levels of quality (including considerations such as bit rate, audio bandwidth, algorithmic delay, number of channels, interoperability with existing standards, etc.). Of particular interest is a low bit rate super-wideband (SWB, 14-kHz bandwidth) coding that is interoperable with the current 3GPP wideband (WB, 7-kHz bandwidth) standard AMR-WB (Adaptive Multi-Rate Wide Band) codecs. Potential operating points are expected to include SWB speech at about 16 kbps implementing interoperability with AMR-WB 12.65 kbps, as well as SWB speech at 12.65 kbps based on a WB core codec possibly operating at about 10-11 kbps. Such bit rate targets indicate a need for a very low bit rate SWB extension of WB speech and audio codecs. This SWB extension should significantly improve the user experience (i.e. provide high quality) while having low complexity and low delay.
It is understood that a low estimate for a required bit rate of the SWB extension will be about 1.0-1.6 kbps. For example, a total bit rate of 12.65 kbps based on a 11 kbps WB core coding suggests that the highest possible bit rate for the SWB part would be 1.65 kbps. However this required extension bit rate may be decreased perhaps as low as 1.0 kbps.
Conventional SWB extension methods based on the technology described by Tammi et al such as “Scalable Superwideband Extension for Wideband Coding,” as discussed in ICASSP 2009, Taipei, Taiwan, 2009 operating around 2.0 kbps can spend about 50% or around 1.0 kbps transmitting the subband indices. Thus to reach 1.5 kbps or even 1.0 kbps and still be able to provide a suitable performance is problematic.
One approach to reduce the bits sent transmitting index values is to not transmit an optimal index at all for one or more of the subbands but to use a fixed point (a fixed, predetermined index) for the subband replication step.
The fixed-index solution however although reducing the bits sent is problematic and produces poor quality audio signals because it can produce unwanted periodicity in the highest frequencies which are heard as “chirping” sounds that clearly are not part of the original signal and can be very annoying.