Patent Literature (hereinafter, referred to as “PTL”) 1 discloses a technique that enables efficient encoding of speech signals or music signals in a super-wide band (SWB) (typically, 0.05 to 14 kHz band). This technique has been standardized by ITU-T (see, for example, NPL1 and NPL2). In this technique, a low band part (a band of, for example, up to 7 kHz) of an input signal such as a speech signal or a music signal is encoded by a core coding section while a high band part (a band higher than, for example, 7 kHz) is encoded by an extension band coding section.
In general, the core coding section uses CELP (code excited linear prediction) coding. Meanwhile, the extension band coding section performs encoding in the frequency domain using information encoded by the core coding section. More specifically, the extension band coding section uses a spectrum (decoded low band spectrum) obtained as a result of decoding a narrowband signal in the low band part (not higher than 7 kHz) encoded by the core coding section and transforming the decoded narrow-band signal into MDCT (modified discrete cosine transform) coefficients (spectrum), for encoding for the high band part (a band higher than 7 kHz; hereinafter referred to as “extension band”).
At the time of encoding for the extension band, first, the decoded low band spectrum generated by the core coding section is normalized using a spectrum power envelope (hereinafter referred to as “envelope”). More specifically, the low band part including the decoded low band spectrum is divided into a plurality of sub-bands, and energy (sub-band energy) is calculated for each sub-band. Next, the sub-band energy is smoothened in order to smooth energy fluctuations in the frequency domain. Next, a spectrum included in each sub-band is normalized using the smoothened sub-band energy. The extension band coding section makes a search to find bands that are highly correlated with each other from the spectrum (normalized spectrum) obtained as described above and an extension band spectrum in the input signal and encodes information indicating the highly-correlated bands as a lag. Also, the extension band coding section copies the highly-correlated band in the low band part to the extension band in order to use the highly-correlated band in the low band part as a spectrum fine structure (frequency-based fine structure) in the extension band. Then, the extension band coding section calculates a gain between the spectrum fine structure and the extension band spectrum and encodes the gain.
As a result of the above processing being performed, an extension band spectrum is generated from a low band spectrum.
The reason for normalizing the low band spectrum when an extension band spectrum is generated from a low band spectrum in an input signal is as follows. In general, a low band spectrum has very large energy bias, and a high band, i.e., extension band, spectrum has small energy bias. In other words, in the high band part, high peaks are less likely to appear locally compared to the low band part, and thus, copying a signal having a high peaking property to the high band part (extension band) may result in sound quality deterioration. Therefore, in a coding apparatus, a low band spectrum is normalized because encoding can be performed more efficiently when correlation between the low band spectrum and an extension band spectrum is calculated after energy bias in the low band spectrum is removed to flatten (normalize) the low band spectrum.
NPL 3 discloses a related technique in which transform coding is used in a core coding section. In this related technique, an MPEG (Moving Picture Experts Group) AAC (Advanced Audio Coding) method is used in the core coding section. Also, extension band coding is performed using a SBR (spectral band replication) method, which is different from the extension band coding method described above.