CELP (Code Excited Linear Prediction) is known as a method for high-quality compression of a speech with a low bit rate. However, although CELP can encode a speech signal with high efficiency, it has a problem of a loss of sound quality with respect to a music signal. To solve this problem, TCX (Transform Coded eXcitation), which converts to the frequency domain and encodes an LPC residual signal generated by an LPC (Linear Predication Coefficient) inverse filter has been proposed (for example in Non-Patent Literature (hereinafter, referred to as “NPL”) 1). With TCX, because conversion coefficients converted to the frequency domain are directly quantized, detailed representation of a spectrum is possible, and it is possible to achieve high sound quality in a music signal. Therefore, when encoding a music signal, the approach of encoding in the frequency domain, such as in TCX, has become the most popular method. Hereinafter, the signal that is the subject of encoding in the frequency domain is referred to as target signal.
NPL 1 discusses encoding of a wideband signal by TCX, in which an input signal is fed into an LPC inverse filter to obtain an LPC residual signal that, after removing long term correlation components from the LPC residual signal, is fed into a weighted synthesis filter. The signal that has been fed into the weighted synthesis filter is converted to the frequency domain so as to obtain an LPC residual spectrum signal. The LPC residual spectrum signal that is obtained is encoded in the frequency domain. In the case of a music signal, because of a fact that the temporal correlation tends to be high in a high frequency band, a method is adopted that encodes spectrum difference from the previous frame by a vector quantization all at one time.
Also, in Patent Literature (hereinafter, referred to as “PTL”) 1, there is a proposed method, based on a combination of ACELP and TCX, for low-frequency emphasis and encoding with respect to an LPC residual spectrum signal obtained in the same manner as in PTL 1. The target vector is split into subbands of eight samples each, with the spectral shape and gain encoded by subbands. Although many bits are allocated for the gain in the subband having the largest energy, the overall sound quality is improved by assuring that the bits allocated to low-band ends lower than the largest band are not insufficient. The spectral shape is encoded by lattice vector quantization.
In NPL 1, the correlation of the previous frame with respect to the target signal is used to compress the amount of data and bits are allocated in the order of decreasing amplitude. In PTL 1, subbands are defined in each every eight samples, and while care is taken that the low-band end is particularly allocated a sufficient number of bits, a large number of bits are allocated to subbands having a large amount of energy.