In communications, to utilize the network resources more efficiently, audio codecs are adopted to compress audio signals at low bitrates with an acceptable range of subjective quality. Accordingly, there is a need to increase the compression efficiency to overcome the bitrate constraints when encoding an audio signal.
Bandwidth extension (BWE) is a widely used technique in encoding an audio signal to efficiently compress wideband (WB) or super-wideband (SWB) audio signals at a low bitrate. In encoding, BWE parametrically represents a high frequency band signal utilizing the decoded low frequency band signal. That is, BWE searches for and identifies a portion similar to a subband of the high frequency band signal from the low frequency band signal of the audio signal, and encodes parameters which identify the similar portion and transmit the parameters, while BWE enables high frequency band signal to be resynthesized utilizing the low frequency band signal at a signal-receiving side. It is possible to reduce the amount of parameter information to be transmitted, by utilizing a similar portion of the low frequency band signal, instead of directly encoding the high frequency band signal, thus increasing the compression efficiency.
One of the audio/speech codecs which utilize BWE functionality is G.718-SWB, whose target applications are VoIP devices, video-conference equipments, teleconference equipments and mobile phones.
The configuration of G.718-SWB [1] is illustrated in FIGS. 1 and 2 (see, e.g., Non-Patent Literature (hereinafter, referred to as “NPL”) 1).
At an encoding apparatus side illustrated in FIG. 1, the audio signal (hereinafter, referred to as input signal) sampled at 32 kHz is firstly down-sampled to 16 kHz (101). The down-sampled signal is encoded by the G.718 core encoding section (102). The SWB bandwidth extension is performed in MDCT domain. The 32 kHz input signal is transformed to MDCT domain (103) and processed through a tonality estimation section (104). Based on the estimated tonality of the input signal (105), generic mode (106) or sinusoidal mode (108) is used for encoding the first layer of SWB. Higher SWB layers are encoded using additional sinusoids (107 and 109).
The generic mode is used when the input frame signal is not considered to be tonal. In the generic mode, the MDCT coefficients (spectrum) of the WB signal encoded by a G.718 core encoding section are utilized to encode the SWB MDCT coefficients (spectrum). The SWB frequency band (7 to 14 kHz) is split into several subbands, and the most correlated portion is searched for every subband from the encoded and normalized WB MDCT coefficients. Then, a gain of the most correlated portion is calculated in terms of scale such that the amplitude level of SWB subband is reproduced to obtain parametric representation of the high frequency component of SWB signal.
The sinusoidal mode encoding is used in frames that are classified as tonal. In the sinusoidal mode, the SWB signal is generated by adding a finite set of sinusoidal components to the SWB spectrum.
At a decoding apparatus side illustrated in FIG. 2, the G.718 core codec decodes the WB signal at 16 kHz sampling rate (201). The WB signal is post-processed (202), and then up-sampled (203) to 32 kHz sampling rate. The SWB frequency components are reconstructed by SWB bandwidth extension. The SWB bandwidth extension is mainly performed in MDCT domain. Generic mode (204) and sinusoidal mode (205) are used for decoding the first layer of the SWB. Higher SWB layers are decoded using an additional sinusoidal mode (206 and 207). The reconstructed SWB MDCT coefficients are transformed to a time domain (208) followed by post-processing (209), and then added to the WB signal decoded, by the G.718 core decoding section to reconstruct the SWB output signal in the time domain.