Audio coding systems are well known from the state of the art. They are used in particular for transmitting or storing audio signals.
An audio coding system which is employed for transmission of audio signals comprises an encoder at a transmitting end and a decoder at a receiving end. The transmitting end and the receiving end can be for instance mobile terminals. An audio signal that is to be transmitted is provided to the encoder. The encoder is responsible for adapting the incoming audio data rate to a bitrate level at which the bandwidth conditions in the transmission channel are not violated. Ideally, the encoder discards only irrelevant information from the audio signal in this encoding process. The encoded audio signal is then transmitted by the transmitting end of the audio coding system and received at the receiving end of the audio coding system. The decoder at the receiving end reverses the encoding process to obtain a decoded audio signal with little or no audible degradation.
If the audio coding system is employed for archiving audio data, the encoded audio data provided by the encoder is stored in some storage unit, and the decoder decodes audio data retrieved from this storage unit, for instance for presentation by some media player. In this alternative, it is the objective that the encoder achieves a bitrate which is as low as possible, in order to save storage space.
Depending on the allowed bitrate, different encoding schemes can be applied to an audio signal.
In most cases, a lower frequency band and a higher frequency band of an audio signal correlate with each other. Audio codec bandwidth extension algorithms therefore typically first split the bandwidth of the to be encoded audio signal into two frequency bands. The lower frequency band is then processed independently by a so called core codec, while the higher frequency band is processed using knowledge about the coding parameters and signals from the lower frequency band. Using parameters from the low frequency band coding in the high frequency band coding reduces the bit rate resulting in the high band encoding significantly.
FIG. 1 presents a typical split band encoding and decoding system. The system comprises an audio encoder 10 and an audio decoder 20. The audio encoder 10 includes a two band analysis filterbank 11, a low band encoder 12 and a high band encoder 13. The audio decoder 20 includes a low band decoder 21, a high band decoder 22 and a two band synthesis filterbank 23. The low band encoder 12 and decoder 21 can be for example the Adaptive Multi-Rate Wideband (AMR-WB) standard encoder and decoder, while the high band encoder 13 and decoder 22 may comprise either an independent coding algorithm, a bandwidth extension algorithm or a combination of both. By way of example, the presented system is assumed to use the extended AMR-WB (AMR-WB+) codec as a split band coding algorithm.
An input audio signal 1 is first processed by the two-band analysis filterbank 11, in which the audio frequency band is split into a lower frequency band and a higher frequency band. For illustration, FIG. 2 presents an example of a frequency response of a two-band filterbank for the case of AMR-WB+. A 12 kHz audio band is divided into a 0 kHz to 6.4 kHz band L and a 6.4 kHz to 12 kHz band H. In the two-band analysis filterbank 11, the resulting frequency bands are moreover critically down-sampled. That is, the low frequency band is down-sampled to 12.8 kHz and the high frequency band is re-sampled to 11.2 kHz.
The low frequency band and the high frequency band are then encoded independently of each other by the low band encoder 12 and the high band encoder 13, respectively.
The low band encoder 12 comprises to this end full source signal encoding algorithms. The algorithms include an algebraic code excitation linear prediction (ACELP) type of algorithm and a transform based algorithm. The actually employed algorithm is selected based on the signal characteristics of the respectively input audio signal. The ACELP algorithm is typically selected for encoding speech signals and transients, while the transform based algorithm is typically selected for encoding music and tone-like signals to better handle the frequency resolution.
In an AMR-WB+ codec, the high band encoder 13 utilizes a linear prediction coding (LPC) to model the spectral envelope of the high frequency band signal. The high frequency band can then be described by means of LPC synthesis filter coefficients which define the spectral characteristics of the synthesized signal, and gain factors for an excitation signal which control the amplitude of the synthesized high frequency band audio signal. The high band excitation signal is copied from the low band encoder 12. Only the LPC coefficients and the gain factors are provided for transmission.
The output of the low band encoder 12 and of the high band encoder 13 are multiplexed to a single bit stream 2.
The multiplexed bit stream 2 is transmitted for example through a communication channel to the audio decoder 20, in which the low frequency band and the high frequency band are decoded separately.
In the low band decoder 21, the processing in the low band encoder 12 is reversed for synthesizing the low frequency band audio signal.
In the high band decoder 22, an excitation signal is generated by re-sampling a low frequency band excitation provided by the low band decoder 21 to the sampling rate used in the high frequency band. That is, the low frequency band excitation signal is reused for decoding of the high frequency band by transposing the low frequency band signal to the high frequency band. Alternatively, a random excitation signal could be generated for the reconstruction of the high frequency band signal. The high frequency band signal is then reconstructed by filtering the scaled excitation signal through the high band LPC model defined by the LPC coefficients.
In the two band synthesis filterbank 23, the decoded low frequency band signals and the high frequency band signals are up-sampled to the original sampling frequency and combined to a synthesized output audio signal 3.
The input audio signal 1 which is to be encoded can be a mono audio signal or a multichannel audio signal containing at least a first and a second channel signal. An example of a multichannel audio signal is a stereo audio signal, which is composed of a left channel signal and a right channel signal.
For a stereo operation of an AMR-WB+ codec, the input audio signal is equally split into a low frequency band signal and a high frequency band signal in the two band analysis filterbank 11. The low band encoder 12 generates a mono signal by combining the left channel signals and the right channel signals in the low frequency band. The mono signal is encoded as described above. In addition, the low band encoder 12 uses a parametric coding for encoding the differences of the left and right channel signals to the mono signal. The high band encoder 13 encodes the left channel and the right channel separately by determining separate LPC coefficients and gain factors for each channel.
In case the input audio signal 1 is a multichannel audio signal, but the device which is to present the synthesized audio signal 3 does not support a multichannel audio output, the incoming multichannel bit stream 2 has to be converted by the audio decoder 20 into a mono audio signal. At the low frequency band, the conversion of the multichannel signal to a mono signal is straightforward, since the low band decoder 21 can simply omit the stereo parameters in the received bit stream and decode only the mono part. But for the high frequency band, more processing is required, as no separate mono signal part of the high frequency band is available in the bit stream.
Conventionally, the stereo bit stream for the high frequency band is decoded separately for left and right channel signals, and the mono signal is then created by combining the left and right channel signals in a down-mixing process. This approach is illustrated in FIG. 3.
FIG. 3 schematically presents details of the high band decoder 22 of FIG. 1 for a mono audio signal output. The high band decoder comprises to this end a left channel processing portion 30 and a right channel processing portion 33. The left channel processing portion 30 includes a mixer 31, which is connected to an LPC synthesis filter 32. The right channel processing portion 33 includes equally a mixer 34, which is connected to an LPC synthesis filter 35. The output of both LPC synthesis filters 32, 35 is connected to a further mixer 36.
A low frequency band excitation signal which is provided by the low band decoder 21 is fed to either of the mixers 31 and 34. The mixer 31 applies the gain factors for the left channel to the low frequency band excitation signal. The left channel high band signal is then reconstructed by the LPC synthesis filter 32 by filtering the scaled excitation signal through a high band LPC model defined by the LPC coefficients for the left channel. The mixer 34 applies the gain factors for the right channel to the low frequency band excitation signal. The right channel high band signal is then reconstructed by the LPC synthesis filter 35 by filtering the scaled excitation signal through a high band LPC model defined by the LPC coefficients for the right channel.
The reconstructed left channel high frequency band signal and the reconstructed right channel high frequency band signal are then converted by the mixer 36 into a mono high frequency band signal by computing their average in the time domain.
This is, in principle, a simple and working approach. However, it requires a separate synthesizing of multiple channels, even though, in the end, only a single channel signal is needed.
Furthermore, if the multichannel audio input signal 1 is unbalanced in such a way that most of the energy of the multichannel audio signal lies on one of the channels, a direct mixing of multichannels by computing their average will result in an attenuation in the combined signal. In an extreme case, one of the channels is completely silent, which leads to an energy level of the combined signal which is half of the energy level of the original active input channel.