Speech and audio coding algorithms have a wide variety of applications in communication, multimedia and storage systems. The development of the coding algorithms is driven by the need to save transmission and storage capacity while maintaining the high quality of the synthesized signal. The complexity of the coder is limited by the processing power of the application platform. In some applications, e.g. voice storage, the encoder may be highly complex, while the decoder should be as simple as possible.
In a typical speech coder, the input speech signal is processed in segments, which are called frames. Usually the frame length is 10–30 ms, and a look-ahead segment of 5–15 ms of the subsequent frame is also available. The frame may further be divided into a number of subframes. For every frame, the encoder determines a parametric representation of the input signal. The parameters are quantized, and transmitted through a communication channel or stored in a storage medium in a digital form. At the receiving end, the decoder constructs a synthesized signal based on the received parameters.
Most current speech coders include a linear prediction (LP) filter, for which an excitation signal is generated. The LP filter typically has an all-pole structure, as given by the following equation:                                           1                          A              ⁡                              (                z                )                                              =                      1                          1              +                                                a                  1                                ⁢                                  z                                      -                    1                                                              +                                                a                  2                                ⁢                                  z                                      -                    2                                                              +              …              +                                                a                  p                                ⁢                                  z                                      -                    p                                                                                      ,                            (        1        )            where A(z) is an inverse filter with unquantized LP coeffiients a1, a2, . . . , ap and p is the predictor order, which is usually 8–12.
The input speech signal is processed in frames. For each speech frame, the encoder determines the LP coefficients using, for example, the Levinson-Durbin algorithm. (see “AMR Speech Codec; Transcoding functions” 3G TS 26.090 v3.1.0 (1999-12)). Line spectral frequency (LSF) representation or other similar representations, such as line spectral pair (LSP), immittance spectral frequency (ISF) and immittance spectral pair (ISP), where the resulting stable filter is represented by an order vector, are employed for quantization of the coefficients, because they have good quantization properties. For intermediate subframes, the coefficients are linearly interpolated using the LSF representation.
In order to define the LSFs, the inverse LP filter A(z) polynomial is used to construct two polynomials:P(z)=A(z)+z−(p+1)A(z−1), =(1−z−1)κ(1−2z−1 cos ωi+z−2), i=2, 4, . . . , p  (2)andQ(z)=A(z)−z−(p+1)A(z−1)=(1−z−1)κ(1−2z−1 cos ωi+z−2), i=1, 3, . . . , p−1.  (3)The roots of the polynomials P(z) and Q(z) are called LSF coefficients. All the roots of these polynomials are on the unit circle ejωi with i=1, 2, . . . p. The polynomials P(z) and Q(z) have the following properties: 1) all zeros (roots) of the polynomials are on the unit circle 2) the zeros of P(z) and Q(z) are interlaced with each other. More specifically, the following relationship is always satisfied:0=ω0<ω1<ω2< . . . <ωp−1<ωp<ωp+1=π  (4)
This ascending ordering guarantees the filter stability, which is often required in speech coding applications. Note, that the first and last parameters are always 0 and π respectively, and only p values have to be transmitted.
While in speech coders efficient representation is needed for storing the LSF information, the LSFs are quantized using vector quantization (VQ), often together with prediction (see FIG. 1). Usually, the predicted values are estimated based on the previously decoded output values (AR (auto-regressive)—predictor) or previously quantized values (MA (moving average)—predictor).                                           p            ⁢                                                  ⁢            L            ⁢                                                  ⁢            S            ⁢                                                  ⁢                          F              k                                =                                    m              ⁢                                                          ⁢              L              ⁢                                                          ⁢              S              ⁢                                                          ⁢              F                        +                                          ∑                                  j                  =                  1                                m                            ⁢                                                A                  j                                ⁡                                  (                                                            q                      ⁢                                                                                          ⁢                      L                      ⁢                                                                                          ⁢                      S                      ⁢                                                                                          ⁢                                              F                                                  k                          -                          j                                                                                      -                                          m                      ⁢                                                                                          ⁢                      L                      ⁢                                                                                          ⁢                      S                      ⁢                                                                                          ⁢                      F                                                        )                                                      +                                          ∑                                  i                  =                  1                                n                            ⁢                                                B                  i                                ⁢                C                ⁢                                                                  ⁢                                  B                                      k                    -                    i                                                                                      ,                            (        5        )            where Ajs and Bis are the predictor matrices, and m and n the orders of the predictors. pLSFk, qLSFk and CBk are, respectively, the predicted LSF, quantized LSF and codebook vector for the frame k. mLSK is the mean LSF vector.
After the predicted value is calculated, the quantized LSF value can be obtained:qLSFk=pLSFk+CBk,  (6)where CBk is the optimal codebook entry for the frame k.
In practice, when using predictive quantization or constrained VQ, the stability of the resulting qLSFk has to be checked before conversion to LP coefficients. Only in case of direct VQ (non-predictive, single stage, unsplit) the codebook can be designed so that the resulting quantized vector is always in order.
In prior art solutions, the filter stability is guaranteed by ordering the LSF vector after the quantization and codebook selection.
While searching for the best codebook vector, often all vectors are tried out (full search) and some perceptually important goodness measure is calculated for every instance. The block diagram of a commonly used search procedure is shown in FIG. 1a. 
Optimally, selection is based on spectral distortion SDi as follows:                                           S            ⁢                                                  ⁢            D                    =                                    1              π                        ⁢                                          ∫                0                π                            ⁢                                                                    [                                                                  log                        ⁢                                                                                                  ⁢                                                  S                          ⁡                                                      (                            ω                            )                                                                                              -                                              log                        ⁢                                                                                                  ⁢                                                                              S                            ^                                                    ⁡                                                      (                            ω                            )                                                                                                                ]                                    2                                ⁢                                  ⅆ                  ω                                                                    ,                            (        7        )            where Ŝ(ω) and S (ω) are the spectra of the speech frame with and without quantization, respectively. This is computationally very intensive, and thus simpler methods are used instead.
A commonly used method is to weight the LSF error (rLSFik) with weight (Wk). For example, the following weighting is used (see “AMR Speech Codec; Transcoding functions” 3G TS 26.090 v3.1.0 (1999-12)):                                                                                           W                  k                                =                                                                                                    3.347                        -                                                                              1.547                            450                                                    ⁢                                                      d                            k                                                                                                                                                                                                                                        ⁢                                                                              fo                            ⁢                                                                                                                  ⁢                            r                            ⁢                                                                                                                  ⁢                                                          d                              k                                                                                <                                                      450                            ⁢                                                                                                                  ⁢                            Hz                                                                                                                                                                                                                      =                                                                                                    1.8                        -                                                                              0.8                            1050                                                    ⁢                                                      (                                                          450                              -                                                              d                                k                                                                                      )                                                                                                                                                              otherwise                        ,                                                                                                                                                             (        8        )            where dk=LSFk+1−LSFk−1 with LSF0=0 Hz and LSF11=4000 Hz.
Basically, this distortion measurement depends on the distances between the LSF frequencies. The closer the LSFs are to each other, the more weighting they get. Perceptually, this means that formant regions are quantized more precisely.
Based on the distortion value, the codebook vector giving the lowest value is selected as the best codebook index. Normally, the criterion is                                                         min              i                        ⁢                          {                              S                ⁢                                                                  ⁢                                  D                  i                                            }                                =                                    ∑                              k                =                1                            p                        ⁢                                                            (                                                            L                      ⁢                                                                                          ⁢                      S                      ⁢                                                                                          ⁢                                              F                        k                                                              -                                          p                      ⁢                                                                                          ⁢                      L                      ⁢                                                                                          ⁢                      S                      ⁢                                                                                          ⁢                                              F                        k                                                              -                                          CB                      k                      i                                                        )                                2                            ⁢                              W                k                2                                                    ,                            (        9        )            As can be seen in FIG. 1a, the difference between a target LSF coefficients LSFk and a respective predicted LSF coefficients pLSFk is first determined in a summing device 12, and the difference is further adjusted by a respective residual codebook vector CBj1k of the jth codebook entry in another summing device 14. Equation 9 can be reduced to                                           min            ⁢                          {                              S                ⁢                                                                  ⁢                                  D                  i                                            }                                =                                    ∑                              k                =                1                            p                        ⁢                                                            (                                                            L                      ⁢                                                                                          ⁢                      S                      ⁢                                                                                          ⁢                                              F                        k                                                              -                                          q                      ⁢                                                                                          ⁢                      L                      ⁢                                                                                          ⁢                      S                      ⁢                                                                                          ⁢                                              F                        k                                                                                                                                  ⁢                          i                                                                                                      )                                2                            ⁢                              W                k                2                                                    ,                            (        10        )            and further reduced to                                                         min              i                        ⁢                          {                              S                ⁢                                                                  ⁢                                  D                  i                                            }                                =                                    ∑                              k                =                1                            p                        ⁢                                                            (                                      r                    ⁢                                                                                  ⁢                    L                    ⁢                                                                                  ⁢                    S                    ⁢                                                                                  ⁢                                          F                      k                                                                                                                        ⁢                        i                                                                              )                                2                            ⁢                              W                k                2                                                    ,                            (        11        )            The reduction steps, as shown in Equations 10 and 11, can be visualized easier in an encoder, as shown in FIG. 1b. As shown in FIG. 1b, a summing device 16 is used to compute the quantized LSF coefficients. Subsequently, the LSF error is computed by the summing device 18 from the quantized LSF coefficients and the target LSF coefficients.
Prior art solutions do not necessarily find the optimal codebook index if the quantized LSF coefficients qLSFki are not in ascending order regarding k. FIGS. 2a–2e illustrate such a problem. For simplicity, only the first three LSF coefficients are shown (k=1,2,3). However, this simplified demonstration adequately represents the rather usual first split in the case of split VQ. The target LSF vector is marked with LSF1 . . . LSF3, and the predicted values, based on the LSF of the previous frames, are also shown (pLSF1 . . . pLSF3). As shown in FIG. 2a, while some predicted values are greater than the respective target vectors, some are smaller. The first codebook entry in the vector quantizer residual codebook might look like the codebook vectors, as shown in FIG. 2b. With qLSF11−3=pLSF1−3+CB11−3, the quantized LSF coefficients are calculated and shown in FIG. 2c. For simplicity, no weight is used, or Wk=1, and the spectral distortion is directly proportional to the squared or absolute distance between the target and the quantization value (the quantized LSF coefficient). The distance between the target and the quantization value is rLSFik. The total distortion for the first split is thus                               S          ⁢                                          ⁢                      D            1                          =                              ∑                          k              =              1                        3                    ⁢                      S            ⁢                                                  ⁢                                          D                k                1                            .                                                          (        12        )            The second codebook entry (not shown) could yield the quantized LSF vector (qLSF21−3) and the spectral distortion (SD21−3), as shown in FIG. 2d. When FIG. 2d is compared to FIG. 2c, the resulting qLSF vectors are quite different, but the total distortions are almost the same, or (SD1≈SD2). With the first two codebook entries, the resulting quantized LSF vectors are in order.
In order to show the problem associated with the prior art quantization method, it is assumed that the quantized LSF coefficients (qLSF31−3) and the corresponding spectral distortions (SD31−3) resulted from the third codebook entry (not shown) are distributed, as shown in FIG. 2e. The total distortion       (                  S        ⁢                                  ⁢                  D          3                    =                        ∑                      k            =            1                    3                ⁢                  S          ⁢                                          ⁢                      D            k            3                                )    ,according to the spectral distortion, as shown in FIG. 2e, is a very big value. This means that, according to the prior art method, the best codebook index from this first split is the smaller of SD1 and SD2. However, this selected “best” codebook index, as will be illustrated later in FIG. 4a, does not yield the optimal code vector. This is because the resulting quantized LSF vectors are out of order regarding the third codebook entry.
Generally, speech coders require that the linear prediction (LP) filter used therein be stable. Prior art codebook search routine, such as that illustrated in FIG. 1a, might cause the resulting quantized LSF vectors to be out of order and become unstable. In prior art, stabilization of vector is achieved by sorting the LSF vectors after quantization. However, the obtained code vector may not be optimal.
It should be noted that spectral (pair) parameter vectors, such as line spectral pair (LSP) vectors, immittance spectral frequency (ISF) vectors and immittance spectral pair (ISP) vectors, that represent the linear predictive coefficients must also be ordered to be stable.
It is advantageous and desirable to provide a method and system for spectral parameter (or representation) quantization, wherein the obtained code vector is optimized.