There is a need for an encoding scheme that is robust against frame loss in encoding of speech data in speech communication using packets, such as VoIP (Voice over IP). This is because packets on a transmission path are sometimes lost due to congestion or the like in packet communication typified by Internet communication.
As a method for increasing robustness against frame loss, there is an approach of minimizing the influence of the frame loss by, even when one portion of transmission information is lost, carrying out decoding processing from another portion of the transmission information (see Patent Document 1, for example). Patent Document 1 discloses a method of packing encoding information of a core layer and encoding information of enhancement layers into separate packets using scalable encoding and transmitting the packets. As application of packet communication, there is multicast communication (one-to-many communication) using a network in which thick lines (broadband lines) and thin lines (lines having a low transmission rate) are mixed. Scalable encoding is also effective when communication between multiple points is performed on such a non-uniform network, because there is no need to transmit various encoding information for each network when the encoding information has a layer structure corresponding to each network.
For example, as a bandwidth-scalable encoding technique which is based on a CELP scheme that enables high-efficient encoding of speech signals and has scalability in the signal bandwidth (in the frequency axis direction), there is a technique disclosed in Patent Document 2. Patent Document 2 describes an example of the CELP scheme for expressing spectral envelope information of speech signals using an LSP (Line Spectrum Pair) parameter. Here, a quantized LSP parameter (narrowband-encoded LSP) obtained by an encoding section (in a core layer) for narrowband speech is converted into an LSP parameter for wideband speech encoding using the equation (1) below, and the converted LSP parameter is used at an encoding section (in an enhancement layer) for wideband speech, and thereby a band-scalable LSP encoding method is realized.fw(i)=0.5×fn(i) [wherein; i=0, . . . ,Pn−1]=0.0 [wherein; i=Pn, . . . ,Pw−1]  (1)
In the equation, fw(i) is the LSP parameter of ith order in the wideband signal, fn(i) is the LSP parameter of ith order in the narrowband signal, Pn is the LSP analysis order of the narrowband signal, and Pw is the LSP analysis order of the wideband signal.
In Patent Document 2, a case is described as an example where the sampling frequency of the narrowband signal is 8 kHz, the sampling frequency of the wideband signal is 16 kHz, and the wideband LSP analysis order is twice the narrowband LSP analysis order. The conversion from a narrowband LSP to a wideband LSP can therefore be performed using a simple equation expressed in equation (1). However, the position of the LSP parameter of Pn order on the low-order side of the wideband LSP is determined with respect to the entire wideband signal including the LSP parameter of (Pw−Pn) order on the high-order side, and therefore the position does not necessarily correspond to the LSP parameter of Pn order of the narrowband LSP. Therefore, high conversion efficiency (which can also be referred to as predictive accuracy when we consider the wideband LSP to be predicted from the narrowband LSP) cannot be obtained in the conversion expressed in equation (1). The encoding performance of a wideband LSP encoding apparatus designed based on equation (1) bears improvements.
Non-patent Document 1, for example, describes a method of calculating optimum conversion coefficient β(i) for each order as shown in equation (2) below using an algorithm for optimizing the conversion coefficient, instead of setting 0.5 for the conversion coefficient by which the narrowband LSP parameter of the ith order of equation (1) is multiplied.fw—n(i)=α(i)×L(i)+β(i)×fn—n(i)  (2)
In the equation, fw_n(i) is the wideband quantized LSP parameter of the ith order in the nth frame, α(i)×L(i) is the element of the ith order of the vector in which the prediction error signal is quantized (α(i) is the weighting coefficient of the ith order), L(i) is the LSP prediction residual vector, β(i) is the weighting coefficient for the predicted wideband LSP, and fn_n(i) is the narrowband LSP parameter in the nth frame. By optimizing the conversion coefficient in this way, it is possible to realize higher encoding performance with an LSP encoding apparatus which has the same configuration as the one described in Patent Document 2.
According to Non-patent Document 2, for example, the analysis order of the LSP parameter is appropriately about 8th to 10th for a narrowband speech signal in the frequency range of 3 to 4 kHz, and is appropriately about 12th to 16th for a wideband speech signal in the frequency range of 5 to 8 kHz.    Patent Document 1: Japanese Patent Application Laid-Open No. 2003-241799    Patent Document 2: Japanese Patent No. 3134817    Non-patent Document 1: K. Koishida et al., “Enhancing MPEG-4 CELP by jointly optimized inter/intra-frame LSP predictors,” IEEE Speech Coding Workshop 2000, Proceeding, pp. 90-92, 2000.    Non-patent Document 2: S. Saito and K. Nakata, Foundations of Speech Information Processing, Ohmsha, 30 Nov. 1981, p. 91.