Bandwidth extension techniques focus on enhancing the perceptible quality of an audio codec by widening its effective output bandwidth. Instead of coding the full bandwidth range with the underlying core coder, codecs using a bandwidth extension technique allow for less bit consumption in the perceptually less important higher frequency (HF) ranges. Thus, there are more bits available to the core coder processing the more important lower frequency (LF) range at a higher precision. For that reason, bandwidth extension techniques are commonly used in codecs, which need to realize proper perceptual quality at low bit rates.
In general, there are two different basic bandwidth extension approaches that need to be distinguished: Blind bandwidth extension and guided bandwidth extension. In a blind bandwidth extension, no additional side information is transmitted. Thus, the HF-content to be inserted on the decoder side is generated using only information derived from the decoded LF-signal of the core coder. Since a transmission of costly side information is not needed, Blind bandwidth extension techniques are well suited for codecs operating at lowest bit rates or for backward-compatible post-processing procedures. On the other hand, the lack of controllability only allows for a relatively small effective extension of bandwidth using a Blind bandwidth extension (e.g. 6.4-7.0 kHz in [1]). In contrast to the blind approach, in a guided bandwidth extension the HF-content is reconstructed using parameters, which are extracted at the encoder side and transmitted to the decoder as side information in the bitstream. Hence, a guided bandwidth extension enables a better control of the HF-reconstruction, rendering broader effective bandwidths possible. Due to the additional bit consumption, guided bandwidth extension techniques are commonly used for codecs operating at higher bit rates as systems incorporating a blind bandwidth extension.
More specifically, there are different methodologies for realizing a bandwidth extension:
In speech coding, usually source-filter model-based bandwidth extension methods are used, which are closely related to their underlying core coders, as e.g. in G.722.2 (AMR-WB) [1]. In AMR-WB, the output bandwidth of 6.4 kHz of the ACELP (algebraic code-excited linear prediction) core coder is extended to 7.0 kHz by injecting white noise into the excitation domain. Subsequently, the extended excitation is shaped by a filter derived from the core coder's linear prediction (LP) filter. Depending on the bit rate, the gain for scaling of the inserted noise is either estimated using only core coder information or it is extracted in the encoder and transmitted. This bandwidth extension method is heavily dependent to its underlying coding scheme, as it is using its synthesis mechanisms and thus additionally has to be performed in the same domain.
A well-known core coder independent bandwidth extension technique in audio coding is spectral band replication (SBR) [2]. In contrast to the previous example, spectral band replication can be applied independently from its underlying core coder. As a first step, the input signal is split into an LF- and an HF-part on encoder side, for example by using a quadrature mirror filter analysis filter bank (QMF). The LF-signal is fed to the core coder while the HF-part is processed by spectral band replication. Therefore, parameters describing the time-frequency-envelope of the HF-signal as well as the tonality/noisiness of the HF-signal relative to the LF-signal are extracted and transmitted. After decoding, the signal is transformed using the same type of analysis filter bank as used in the encoder. To reconstruct the HF-content, the decoded signal is copied, mirrored or transposed portion-wise to the HF-range, post-processed to match the tonality/noisiness of the original and shaped temporally as well as spectrally, considering the transmitted parameters. Subsequently, the time domain output signal is generated by a corresponding synthesis filter bank.
In contrast to the previously noted (semi-)parametrical methods there are also multiple layer approaches using multiple, bit rate selective layers for bandwidth extension. This principle is also closely related to scalable coding schemes. Those techniques are often used for extending existing coding systems in an interoperable manner. In [3] a super wideband (SWB) bandwidth extension for G.711.1 and G.722 is presented, which processes the additional bandwidth (8.0-14.4 kHz) with a modified discrete cosine transform (MDCT) based coding scheme independent from the core coder. This approach enables exact reconstruction of HF-parts, but at the expense of high bit consumption that be additionally used.
Although the above-mentioned bandwidth extension approaches are widely spread in present speech and audio coding systems, all of them reveal specific shortcomings or disadvantages, respectively.