Low rate coding applications, such as digital speech, typically employ techniques, such as a Linear Predictive Coding (LPC), to model the spectra of short-term speech signals. Coding systems employing an LPC technique provide prediction residual signals for corrections to characteristics of a short-term model. One such coding system is a speech coding system known as Code Excited Linear Prediction (CELP) that produces high quality synthesized speech at low bit rates, that is, at bit rates of 4.8 to 9.6 kilobits-per-second (kbps). This class of speech coding, also known as vector-excited linear prediction or stochastic coding, is used in numerous speech communications and speech synthesis applications. CELP is also particularly applicable to digital speech encryption and digital radiotelephone communication systems wherein speech quality, data rate, size, and cost are significant issues.
A CELP speech coder that implements an LPC coding technique typically employs long-term (“pitch”) and short-term (“formant”) predictors that model the characteristics of an input speech signal and that are incorporated in a set of time-varying linear filters. An excitation signal, or codevector, for the filters is chosen from a codebook of stored codevectors. For each frame of speech, the speech coder applies the codevector to the filters to generate a reconstructed speech signal, and compares the original input speech signal to the reconstructed signal to create an error signal. The error signal is then weighted by passing the error signal through a weighting filter having a response based on human auditory perception. An optimum excitation signal is then determined by selecting one or more codevectors that produce a weighted error signal with a minimum energy for the current frame.
For example, FIG. 1 is a block diagram of a CELP coder 100 of the prior art. In CELP coder 100, an input signal s(n) is applied to a linear predictive (LP) analyzer 101, where linear predictive coding is used to estimate a short-term spectral envelope. The resulting spectral coefficients (or linear prediction (LP) coefficients) are denoted by the transfer function A(z). The spectral coefficients are applied to an LP quantizer 102 that quantizes the spectral coefficients to produce quantized spectral coefficients Aq that are suitable for use in a multiplexer 109. The quantized spectral coefficients Aq are then conveyed to multiplexer 109, and the multiplexer produces a coded bitstream based on the quantized spectral coefficients and a set of excitation vector-related parameters L, β, I, and γ, that are determined by a squared error minimization/parameter quantization block 108. As a result, for each block of speech, a corresponding set of excitation vector-related parameters is produced that includes long-term predictor (LTP) parameters L and β, and fixed codebook index I and scale factor γ.
The quantized spectral parameters are also conveyed locally to an LP synthesis filter 105 that has a corresponding transfer function 1/Aq(z). LP synthesis filter 105 also receives a combined excitation signal ex(n) and produces an estimate of the input signal ŝ(n) based on the quantized spectral coefficients Aq and the combined excitation signal ex(n). Combined excitation signal ex(n) is produced as follows. A fixed codebook (FCB) codevector, or excitation vector, {tilde over (c)}1 is selected from a fixed codebook (FCB) 103 based on an fixed codebook index parameter I. The FCB codevector {tilde over (c)}1 is then weighted based on the gain parameter γ and the weighted fixed codebook codevector is conveyed to a long-term predictor (LTP) filter 104. LTP filter 104 has a corresponding transfer function ‘1/(1−βz−L),’ wherein β and L are excitation vector-related parameters that are conveyed to the filter by squared error minimization/parameter quantization block 108. LTP filter 104 filters the weighted fixed codebook codevector received from FCB 103 to produce the combined excitation signal ex(n) and conveys the excitation signal to LP synthesis filter 105.
LP synthesis filter 105 conveys the input signal estimate ŝ(n) to a combiner 106. Combiner 106 also receives input signal s(n) and subtracts the estimate of the input signal ŝ(n) from the input signal s(n). The difference between input signal s(n) and input signal estimate ŝ(n) is applied to a perceptual error weighting filter 107, which filter produces a perceptually weighted error signal e(n) based on the difference between ŝ(n) and s(n) and a weighting function W(z). Perceptually weighted error signal e(n) is then conveyed to squared error minimization/parameter quantization block 108. Squared error minimization/parameter quantization block 108 uses the error signal e(n) to determine an optimal set of excitation vector-related parameters L, β, I, and γ that produce the best estimate ŝ(n) of the input signal s(n). The quantized LP coefficients and the optimal set of parameters L, β, I, and γ are then conveyed over a communication channel to a receiving communication device, where a speech synthesizer uses the LP coefficients and excitation vector-related parameters to reconstruct the input speech signal s(n).
In a CELP coder such as coder 100, a synthesis function for generating the CELP coder combined excitation signal ex(n) is given by the following generalized difference equation:ex(n)=γ{tilde over (c)}1(n)+βex(n−L), n=0, N−1  (1)where ex(n) is a synthetic combined excitation signal for a subframe, {tilde over (c)}1(n) is a codevector, or excitation vector, selected from a codebook, such as FCB 103, I is an index parameter, or codeword, specifying the selected codevector, γ is the gain for scaling the codevector, ex(n−L) is a synthetic combined excitation signal delayed by L samples relative to the n-th sample of the current subframe for voiced speech L is typically related to the pitch period), β is a long term predictor (LTP) gain factor, and N is the number of samples in the subframe. When n−L<0, ex(n−L) contains the history of past synthetic excitation, constructed as shown in equation (1). That is, for n−L<0, the expression ‘ex(n−L)’ corresponds to an excitation sample constructed prior to the current subframe, which excitation sample has been delayed and scaled pursuant to an LTP filter transfer function ‘1/(1−βz−L).’
The task of a typical CELP speech coder such as coder 100 is to select the parameters specifying the synthetic excitation, that is, the parameters L, β, I, γ in coder 100, given ex(n) for n<0 and the determined coefficients of short-term Linear Predictor (LP) filter 105, so that when the synthetic excitation sequence ex(n) for n=0, N−1 is filtered through LP filter 105 to yield the synthesized speech signal ŝ(n), the synthesized speech signal most closely approximates, according to a distortion criterion employed, the input speech signal s(n) to be coded at a subframe.
For values of L greater than or equal to N, that is, L≧N, equation (1) is implemented exactly. In such a case, synthetic excitation for the subframe can be equivalently defined asex(n)=βco(n)+γc1(n), n=0, N−1,  (2)wherec0(n)=ex(n−L), n=0, N−1,  (3)c1(n)={tilde over (c)}1(n), n=0, N−1,  (4)and where c0(n) is an LTP vector selected for the subframe and c1(n) is a selected codevector for the subframe. Since L≧N, c0(n) and c1(n), once chosen, are explicitly independent of β and γ in the formulation of equation (2). Moreover, c0(n) is only a function of ex(n) for n<0, which keeps the solution for β a linear problem. Likewise, because L≧N, c1(n) is not affected by long term predictor (LTP) filter 104 at the current subframe. These facts simplify a selection of parameters (L, β, I, γ) by the squared error minimization/parameter quantization block 108 of speech coder 100. A range of L is chosen to cover an expected range of pitch over a wide variety of speakers, and at 8 kHz sampling frequency the range's lower bound is typically set to around 20 samples, corresponding to a pitch frequency of 400 Hz. In order to achieve good coding efficiency, it is advantageous to use N>Lmin, where Lmin is the lower bound on the delay range. Typically the coder's excitation parameters are transmitted at a subframe rate, which subframe rate is inversely proportional to subframe length N. That is, the longer the subframe length N, the less frequently it is necessary to quantize and transmit the coder's subframe parameters.
For values of L less than N, that is, L<N, equation (2) ceases to be equivalent to equation (1). In order to retain the advantages of using the form of equation (2) when L<N, one idea, proposed in U.S. Pat. No. 4,910,781, entitled “Code Excited Linear Predictive Vocoder Using Virtual Searching,” is to modify the definition of c0(n) as follows:
                                          e            ⁢                                                  ⁢                          x              ⁡                              (                n                )                                              =                                    β              ⁢                                                          ⁢                                                c                  o                                ⁡                                  (                  n                  )                                                      +                          γ              ⁢                                                          ⁢                                                c                  1                                ⁡                                  (                  n                  )                                                                    ,                  n          =          0                ,                  N          -          1                ,                            (        5        )                        where                                                                            c            0                    ⁡                      (            n            )                          =                  {                                                                                          e                    ⁢                                                                                  ⁢                                          x                      ⁡                                              (                                                  n                          -                          L                                                )                                                                              ,                                      n                    =                    0                                    ,                                                            Min                      ⁢                                                                                          ⁢                                              (                                                  L                          ,                          N                                                )                                                              -                    1                                    ,                                                                                                                                                c                      0                                        ⁡                                          (                                              n                        -                        L                                            )                                                        ,                                      n                    =                    L                                    ,                                      N                    -                    1                                                                                                          (        6        )                                                                    c              1                        ⁡                          (              n              )                                =                                                    c                ~                            I                        ⁡                          (              n              )                                      ,                  n          =          0                ,                  N          -          1                                    (        7        )            In equation (6), c0(n) contains a vector fetched from a “virtual codebook,” typically an adaptive codebook (ACB), where L<N is allowed. The definition of c1(n) as given in equation (4) is retained in equation (6), which means that, when L<N, {tilde over (c)}1(n) is exempted from being filtered by an LTP filter. This is another departure from direct implementation of equation (1). Thus, equation (5) has the advantages of providing the simplified implementation provided by equation (2) while also permitting L<N. It does so by departing from an exact implementation of equation (1) when L<N.
For example, FIG. 2 is a block diagram of another CELP coder 200 of the prior art that implements equations (5)–(7). Similar to CELP coder 100, in CELP coder 200, quantized spectral coefficients Aq are produced by an LP Analyzer 101 and an LP quantizer 102, which quantized spectral coefficients are conveyed to a multiplexer 109 that produces a coded bitstream based on the quantized spectral coefficients and a set of excitation vector-related parameters L, β, I, and γ, that are determined by a squared error minimization/parameter quantization block 108. The quantized spectral coefficients Aq are also conveyed locally to an LP synthesis filter 105 that has a corresponding transfer function 1/Aq(z). LP synthesis filter 105 also receives a combined excitation signal ex(n) and produces an estimate of the input signal ŝ(n) based on the quantized spectral coefficients Aq and the combined excitation signal ex(n).
CELP coder 200 differs from CELP coder 100 in the techniques used to produce combined excitation signal ex(n). In CELP coder 200, a first excitation vector c0(n) is selected from a virtual codebook 201 based on the excitation vector-related parameter L. Virtual codebook 201 typically is an adaptive codebook (ACB), in which event the first excitation vector is an adaptive codebook (ACB) codevector. The virtual codebook codevector c0(n) is then weighted based on the gain parameter β and the weighted virtual codebook codevector is conveyed to a first combiner 203. A fixed codebook (FCB) codevector, or excitation vector, {tilde over (c)}1(n) is selected from a fixed codebook (FCB) 202 based on the excitation vector-related parameter I FCB codevector {tilde over (c)}1(n) (or equivalently c1(n), per equation (7)) is then weighted based on the gain parameter γ and is also conveyed to first combiner 203. First combiner 203 then produces the combined excitation signal ex(n) by combining the weighted version of virtual codebook codevector c0(n) with the weighted version of FCB codevector c1(n).
LP synthesis filter 105 conveys the input signal estimate ŝ(n) to a second combiner 106. Second combiner 106 also receives input signal s(n) and subtracts the input signal estimate ŝ(n) from the input signal s(n). The difference between input signal s(n) and input signal estimate ŝ(n) is applied to a perceptual error weighting filter 107, which filter produces a perceptually weighted error signal e(n) based on the difference between ŝ(n) and s(n) and a weighting function W(z). Perceptually weighted error signal e(n) is then conveyed to a squared error minimization/parameter quantization block 108. Squared error minimization/parameter quantization block 108 uses the error signal e(n) to determine an optimal set of excitation vector-related parameters L, β, I, and γ that produce the best estimate ŝ(n) of the input signal s(n). Similar to coder 100, coder 200 conveys the quantized spectral coefficients and the selected set of parameters L, β, I, and γ over a communication channel to a receiving communication device, where a speech synthesizer uses the LP coefficients and excitation vector-related parameters to reconstruct the coded version of input speech signal s(n).
In a paper entitled “Design of a psi-celp coder for mobile communications,” by Mano, K; Moriya, T; Miki, S; and Ohmuro, H., Proceedings of the IEEE Workshop on Speech Coding for Telecommunications, pp. 21–22, Oct. 13–15, 1993, the “virtual codebook” concept proposed in U.S. Pat. No. 4,910,781 was extended to also modify the definition of the a fixed codebook codevector when L<N, that is,
                                          e            ⁢                                                  ⁢                          x              ⁡                              (                n                )                                              =                                    β              ⁢                                                          ⁢                                                c                  o                                ⁡                                  (                  n                  )                                                      +                          γ              ⁢                                                          ⁢                                                c                  1                                ⁡                                  (                  n                  )                                                                    ,                  n          =          0                ,                  N          -          1                ,                            (        8        )                        where                                                                            c            0                    ⁡                      (            n            )                          =                  {                                                                                          e                    ⁢                                                                                  ⁢                                          x                      ⁡                                              (                                                  n                          -                          L                                                )                                                                              ,                                      n                    =                    0                                    ,                                                            Min                      ⁢                                                                                          ⁢                                              (                                                  L                          ,                          N                                                )                                                              -                    1                                    ,                                                                                                                                                c                      0                                        ⁡                                          (                                              n                        -                        L                                            )                                                        ,                                      n                    =                    L                                    ,                                      N                    -                    1                                                                                                          (        9        )                                                      c            1                    ⁡                      (            n            )                          =                  {                                                                                                                                        c                        ~                                            I                                        ⁡                                          (                      n                      )                                                        ,                                      n                    =                    0                                    ,                                                            Min                      ⁢                                                                                          ⁢                                              (                                                  L                          ,                          N                                                )                                                              -                    1                                    ,                                                                                                                                                c                      I                                        ⁡                                          (                                              n                        -                        L                                            )                                                        ,                                      n                    =                    L                                    ,                                      N                    -                    1                                                                                                          (        10        )            It is apparent in equations (8), (9), and (10) that when L<N, c1(n) is periodic in L over N samples.
Another technique for approximating equation (1) when L<N is proposed in the paper “A toll quality 8 kb/s speech codec for the personal communications system (PCS),” by Salami, R., Laflamme, C., Adoul, J.-P., Massaloux, D., and published in IEEE Transactions on Vehicular Technology, Volume 43, Issue 3, Parts 1–2, August 1994, pages 808–816 (hereinafter referred to as “Salami et al.”). The idea proposed by Salami et al. is to apply a zero state long-term filter (a “pitch sharpening filter”) to generate the excitation codevector c1(n), where
                                          e            ⁢                                                  ⁢                          x              ⁡                              (                n                )                                              =                                    β              ⁢                                                          ⁢                                                c                  o                                ⁡                                  (                  n                  )                                                      +                          γ              ⁢                                                          ⁢                                                c                  1                                ⁡                                  (                  n                  )                                                                    ,                  n          =          0                ,                  N          -          1                                    (        11        )                                                      c            0                    ⁡                      (            n            )                          =                  {                                                                                          e                    ⁢                                                                                  ⁢                                          x                      ⁡                                              (                                                  n                          -                          L                                                )                                                                              ,                                      n                    =                    0                                    ,                                                            Min                      ⁢                                                                                          ⁢                                              (                                                  L                          ,                          N                                                )                                                              -                    1                                    ,                                                                                                                                                c                      0                                        ⁡                                          (                                              n                        -                        L                                            )                                                        ,                                      n                    =                    L                                    ,                                      N                    -                    1                                                                                                          (        12        )                                                      c            1                    ⁡                      (            n            )                          =                  {                                                                                                                                        c                        ~                                            I                                        ⁡                                          (                      n                      )                                                        ,                                      n                    =                    0                                    ,                                                            Min                      ⁢                                                                                          ⁢                                              (                                                                              L                            ^                                                    ,                          N                                                )                                                              -                    1                                    ,                                                                                                                                                                                                  c                          ~                                                I                                            ⁡                                              (                        n                        )                                                              +                                                                  β                        ^                                            ⁢                                                                                          ⁢                                                                        c                          1                                                ⁡                                                  (                                                      n                            -                                                          L                              ^                                                                                )                                                                                                      ,                                      n                    =                                          L                      ^                                                        ,                                      N                    -                    1                                                                                                          (        13        )            Note that in equation (12) a “virtual codebook,” or ACB, is being used and the long-term delay {circumflex over (L)}, for the “pitch sharpening filter”, and L, the delay associated with the ACB, are allowed to be different. For example, L may have a value represented with a fraction of a sample resolution (in which case an interpolating filter would be used to calculate fractionally delayed samples), while {circumflex over (L)} may be a function of L, where it is set equal to a value of L rounded or truncated to an integer value closest to L. Alternatively, {circumflex over (L)} may be set equal to L. In addition, in Salami et al. {circumflex over (β)} is a constant set to 0.8.
The presetting of {circumflex over (β)} to a constant value is a limiting feature of Salami et al. In order to provide an improved approximation of equation (1) when L<N, U.S. Pat. No. 5,664,055, entitled “CS-ACELP Speech Compression System with Adaptive Pitch Prediction Filter Gain Based on a Measure of Periodicity” (hereinafter referred to as the “'055 patent”), proposed making {circumflex over (β)} a time varying function based on periodicity, for example where {circumflex over (β)} could be updated at a subframe rate. When β and γ are selected and quantized sequentially, the '055 patent proposed defining {circumflex over (β)} as{circumflex over (β)}=Max(0.2, Min(0.8, β)).  (14)That is, {circumflex over (β)} is initially set equal to β, but is then limited to be not less than 0.2 and no greater than 0.8. The approach set out in the '055 patent is the approach used in speech coder standards Telecommunications Industry Association/Electronic Industries Alliance Interim Standard 127 (TIA/EIA/IS-127) and Global System for Mobile communications (GSM) standard 06.60, which standards are hereby incorporated by reference herein in their entirety.
Typically, the determination of optimal gain parameters β and γ is performed in a sequential manner. However, the sequential determination of optimal gain parameters β and γ is actually sub-optimal, because, once β is selected, its value remains fixed when optimization of γ is performed. If β and γ are not selected and quantized sequentially but instead are jointly selected and quantized, that is, are vector quantized as a (β,γ) pair, a problem arises because gain vector quantization is done after c0(n) and c1(n) have been selected, but c1(n) (equation (13)) is a function of {circumflex over (β)}. As defined by equation (14), {circumflex over (β)} is dependent on the quantized value of β, which is not available until after the vector quantization of the gains β and γ is completed, and the quantized (β,γ) gain vector thus identified. To circumvent this problem, the '055 patent proposes using a modified definition for {circumflex over (β)} when vector quantization of the gains is employed, that is,{circumflex over (β)}=Max(0.2, Min(0.8, βprevious)).  (15)βprevious in equation (15) represents value of β used to define the excitation sequence ex(n) at the preceding subframe. Speech coders described in International Telecommunication Union (ITU) Recommendation G.729, “Coding of Speech at 8 kbit/s using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP),” Geneva, 1996 and TIA/EIA/IS-641 employ this approach. While this approach solves the non-causality problem outlined, it is less than optimal because βprevious will not always accurately model β at the current subframe, particularly when the degree of voicing at the current subframe is substantially different from the degree of voicing at the previous subframe, such as in a voiced-to-unvoiced or unvoiced-to-voiced transition region.
Therefore, a need exists for an improved method of quantizing the gain parameters in a CELP-type speech coder, wherein the gain parameters are jointly optimized based on the current subframe.