1. Field
The present disclosure relates, in general, to signal compression systems and, more particularly, to Code Excited Linear Prediction (CELP)-type speech coding systems.
2. Introduction
Compression of digital speech and audio signals is well known. Compression is generally required to efficiently transmit signals over a communications channel or to compress the signals for storage on a digital media device, such as a solid-state memory device or computer hard disk. Although many compression techniques exist, one method that has remained very popular for digital speech coding is known as Code Excited Linear Prediction (CELP), which is one of a family of “analysis-by-synthesis” coding algorithms. Analysis-by-synthesis generally refers to a coding process by which multiple parameters of a digital model are used to synthesize a set of candidate signals that are compared to an input signal and analyzed for distortion. A set of parameters that yields a lowest distortion is then either transmitted or stored, and eventually used to reconstruct an estimate of the original input signal. CELP is a particular analysis-by-synthesis method that uses one or more codebooks where each codebook essentially includes sets of code-vectors that are retrieved from the codebook in response to a codebook index.
For example, FIG. 6 is a block diagram of a CELP encoder 600 of the prior art. In CELP encoder 600, an input signal s(n), such as a speech signal, is applied to a Linear Predictive Coding (LPC) analysis block 601, where linear predictive coding is used to estimate a short-term spectral envelope. The resulting spectral parameters are denoted by the transfer function A(z). The spectral parameters are applied to an LPC Quantization block 602 that quantizes the spectral parameters to produce quantized spectral parameters Aq that are suitable for use in a multiplexer 608. The quantized spectral parameters Aq are then conveyed to multiplexer 608, and the multiplexer 608 produces a coded bitstream based on the quantized spectral parameters and a set of codebook-related parameters, τ, β, k, and γ, that are determined by a squared error minimization/parameter quantization block 607.
The quantized spectral, or Linear Predictive, parameters are also conveyed locally to an LPC synthesis filter 605 that has a corresponding transfer function 1/Aq(z). LPC synthesis filter 605 also receives a combined excitation signal u(n) from a first combiner 610 and produces an estimate of the input signal s(n) based on the quantized spectral parameters Aq and the combined excitation signal u(n). Combined excitation signal u(n) is produced as follows. An adaptive codebook code-vector cτ is selected from an adaptive codebook (ACB) 603 based on an index parameter τ and the combined excitation signal from the previous subframe u(n-L). The adaptive codebook code-vector cτ is then weighted based on a gain parameter β 630 and the weighted adaptive codebook code-vector is conveyed to first combiner 610. A fixed codebook code-vector ck is selected from a fixed codebook (FCB) 604 based on an index parameter k. The fixed codebook code-vector ck is then weighted based on a gain parameter γ 640 and is also conveyed to first combiner 610. First combiner 610 then produces combined excitation signal u(n) by combining the weighted version of adaptive codebook code-vector cτ with the weighted version of fixed codebook code-vector ck.
LPC synthesis filter 605 conveys the input signal estimate ŝ(n) to a second combiner 612. The second combiner 612 also receives input signal s(n) and subtracts the estimate of the input signal ŝ(n) from the input signal s(n). The difference between input signal s(n) and the input signal estimate ŝ(n) is applied to a perceptual error weighting filter 606, which filter produces a perceptually weighted error signal e(n) based on the difference between ŝ(n) and s(n) and a weighting function W(z). Perceptually weighted error signal e(n) is then conveyed to squared error minimization/parameter quantization block 607. Squared error minimization/parameter quantization block 607 uses the error signal e(n) to determine an optimal set of codebook-related parameters τ, β, k, and γ that produce the best estimate ŝ(n) of the input signal s(n).
FIG. 7 is a block diagram of a decoder 700 of the prior art that corresponds to the encoder 600. As one of ordinary skilled in the art realizes, the coded bitstream produced by the encoder 600 is used by a demultiplexer 708 in the decoder 700 to decode the optimal set of codebook-related parameters, τ, β730, k, and γ 740. The decoder 700 uses a process that is identical to the synthesis process performed by encoder 600, by using an adaptive codebook 703, a fixed codebook 704, signals u(n) and u(n−L), code-vectors cτ and ck, and a LPC synthesis filter 705 to generate output speech. Thus, if the coded bitstream produced by the encoder 600 is received by the decoder 700 without errors, the speech ŝ(n) output by the decoder 700 can be reconstructed as an exact duplicate of the input speech estimate s(n) produced by the encoder 600.
While the CELP encoder 600 is conceptually useful, it is not a practical implementation of an encoder where it is desirable to keep computational complexity as low as possible. As a result, FIG. 8 is a block diagram of an exemplary encoder 800 of the prior art that utilizes an equivalent, and yet more practical, system compared to the encoding system illustrated by encoder 600. To better understand the relationship between the encoder 600 and the encoder 800, it is beneficial to look at the mathematical derivation of encoder 800 from encoder 600. For the convenience of the reader, the variables are given in terms of their z-transforms.
From FIG. 6, it can be seen that the perceptual error weighting filter 606 produces the weighted error signal e(n) based on a difference between the input signal and the estimated input signal, that is:E(z)=W(z)(S(z)−{circumflex over (S)}(z)).  (1)
From this expression, the weighting function W(z) can be distributed and the input signal estimate ŝ(n) can be decomposed into the filtered sum of the weighted codebook code-vectors:
                              E          ⁡                      (            z            )                          =                                            W              ⁡                              (                z                )                                      ⁢                          S              ⁡                              (                z                )                                              -                                                    W                ⁡                                  (                  z                  )                                                                              A                  q                                ⁡                                  (                  z                  )                                                      ⁢                                          (                                                      β                    ⁢                                                                                  ⁢                                                                  C                        τ                                            ⁡                                              (                        z                        )                                                                              +                                      γ                    ⁢                                                                                  ⁢                                                                  C                        k                                            ⁡                                              (                        z                        )                                                                                            )                            .                                                          (        2        )            
The term W(z)S(z) corresponds to a weighted version of the input signal. By letting the weighted input signal W(z)S(z) be defined as Sw(z)=W(z)S(z) and by further letting the weighted synthesis filter 605 of the encoder 600 now be defined by a transfer function H(z)=W(z)/Aq(z), Equation 2 can rewritten as follows:E(z)=Sw(z)−H(z)(βCτ(z)+γCk(z)).  (3)
By using z-transform notation, filter states need not be explicitly defined. Now proceeding using vector notation, where the vector length L is a length of a current speech input subframe, Equation 3 can be rewritten as follows by using the superposition principle:e=sw−H(βcτ+γck)−hzir,  (4)
where:                H is the L×L zero-state weighted synthesis convolution matrix formed from an impulse response of a weighted synthesis filter h(n), such as synthesis filters 815 and 805, and corresponding to a transfer function Hzs(z) or H(z), which matrix can be represented as:        
                              H          =                      [                                                                                h                    ⁡                                          (                      0                      )                                                                                        0                                                  ⋯                                                  0                                                                                                  h                    ⁡                                          (                      1                      )                                                                                                            h                    ⁡                                          (                      0                      )                                                                                        ⋯                                                  0                                                                              ⋮                                                  ⋮                                                  ⋱                                                  ⋮                                                                                                  h                    ⁡                                          (                                              L                        -                        1                                            )                                                                                                            h                    ⁡                                          (                                              L                        -                        2                                            )                                                                                        ⋯                                                                      h                    ⁡                                          (                      0                      )                                                                                            ]                          ,                            (        5        )                            hzir is a L×1 zero-input response of H(z) that is due to a state from a previous speech input subframe,        sw is the L×1 perceptually weighted input signal,        β is the scalar adaptive codebook (ACB) gain,        cτ is the L×1 ACB code-vector indicated by index τ,        γ is the scalar fixed codebook (FCB) gain, and        ck is the L×1 FCB code-vector indicated by index k.        
By distributing H, and letting the input target vector xw=sw−hzir, the following expression can be obtained:e=xw−βHcτ−γHck.  (6)
Equation 6 represents the perceptually weighted error (or distortion) vector e(n) produced by a third combiner 808 of encoder 800 and coupled by the combiner 808 to a squared error minimization/parameter quantization block 807.
From the expression above, a formula can be derived for minimization of a weighted version of the perceptually weighted error, that is, ∥e∥2, by squared error minimization/parameter quantization block 807. A norm of the squared error is given as:ε=∥e∥2=∥xw−βHcτ−γHck∥2.  (7)Note that ∥e∥2 may also be written as ∥e∥2=Σn=0L-1e2(n) or ∥e∥2=eTe, where eT is the vector transpose of e, and is presumed to be a column vector.
Due to complexity limitations, practical implementations of speech coding systems typically minimize the squared error in a sequential fashion. That is, the adaptive codebook (ACB) component is optimized first by assuming the fixed codebook (FCB) contribution is zero, and then the FCB component is optimized using the given (previously optimized) ACB component. The ACB/FCB gains, that is, codebook-related parameters β and γ, may or may not be re-optimized, that is, quantized, given the sequentially selected ACB/FCB code-vectors cτ and ck.
The theory for performing such an example of a sequential optimization process is as follows. First, the norm of the squared error as provided in Equation 7 is modified by setting γ=0, and then expanded to produce:ε=∥xw−βHcτ∥2=xwTxw−2βxwTHcτ+β2cτTHTHcτ.  (8)
Minimization of the squared error is then determined by taking the partial derivative of ε with respect to β and setting the quantity to zero:
                                          ∂            ɛ                                ∂            β                          =                                                            x                w                T                            ⁢                              Hc                τ                                      -                          β              ⁢                                                          ⁢                              c                τ                T                            ⁢                              H                T                            ⁢                              Hc                τ                                              =          0.                                    (        9        )            
This yields an optimal ACB gain:
                    β        =                                                            x                w                T                            ⁢                              Hc                τ                                                                    c                τ                T                            ⁢                              H                T                            ⁢                              Hc                τ                                              .                                    (        10        )            
Substituting the optimal ACB gain back into Equation 8 gives:
                                          τ            *                    =                                                    arg                ⁢                                                                  ⁢                min                            τ                        ⁢                          {                                                                    x                    w                    T                                    ⁢                                      x                    w                                                  -                                                                            (                                                                        x                          w                          T                                                ⁢                                                  Hc                          τ                                                                    )                                        2                                                                              c                      τ                      T                                        ⁢                                          H                      T                                        ⁢                                          Hc                      τ                                                                                  }                                      ,                            (        11        )            where τ* is an optimal ACB index parameter, that is, an ACB index parameter that minimizes the bracketed expression. Typically, τ is a parameter related to a range of expected values of the pitch lag (or fundamental frequency) of the input signal, and is constrained to a limited set of values that can be represented by a relatively small number of bits. Since xw is not dependent on τ, Equation 11 can be rewritten as follows:
                              τ          *                =                                            arg              ⁢                                                          ⁢              max                        τ                    ⁢                                    {                                                                    (                                                                  x                        w                        T                                            ⁢                                              Hc                        τ                                                              )                                    2                                                                      c                    τ                    T                                    ⁢                                      H                    T                                    ⁢                                      Hc                    τ                                                              }                        .                                              (        12        )            
Now, by letting yτ equal the ACB code-vector cτ filtered by weighted synthesis filter 815, that is, yτ=Hcτ, Equation 13 can be simplified to:
                                          τ            *                    =                                                    arg                ⁢                                                                  ⁢                max                            τ                        ⁢                          {                                                                    (                                                                  x                        w                        T                                            ⁢                                              y                        τ                                                              )                                    2                                                                      y                    τ                    T                                    ⁢                                      y                    τ                                                              }                                      ,                            (        13        )            and likewise, Equation 10 can be simplified to:
                    β        =                                                            x                w                T                            ⁢                              y                τ                                                                    y                τ                T                            ⁢                              y                τ                                              .                                    (        14        )            
Thus Equations 13 and 14 represent the two expressions necessary to determine the optimal ACB index τ and ACB gain β in a sequential manner. These expressions can now be used to determine the optimal FCB index and gain expressions. First, from FIG. 8, it can be seen that a second combiner 806 produces a vector x2, where x2=xw−βHcτ. The vector xw (or xw(n)) is produced by a first combiner 804 that subtracts a filtered past synthetic excitation signal hzir(n), after filtering past synthetic excitation signal u(n-L) by a weighted synthesis zero input response Hzir(z) filter 801, from an output sw(n) of a perceptual error weighting filter W(z) 802 of input speech signal s(n). The term βHcτ is a filtered and weighted version of ACB code-vector eτ, that is, ACB code-vector cτ filtered by zero state weighted synthesis filter Hzs(z) 815 to generate y(n) and then weighted based on ACB gain parameter β830. Substituting the expression x2=xw−βHcτ into Equation 7 yields:ε=∥x2−γHck∥2.  (15)where γHck is a filtered and weighted version of FCB code-vector ck, that is, FCB code-vector ck filtered by zero state weighted synthesis filter Hzs(z) 805 and then weighted based on FCB gain parameter γ 840. Similar to the above derivation of the optimal ACB index parameter τ*, it is apparent that:
                                          k            *                    =                                                    arg                ⁢                                                                  ⁢                max                            k                        ⁢                          {                                                                    (                                                                  x                        2                        T                                            ⁢                                              Hc                        k                                                              )                                    2                                                                      c                    k                    T                                    ⁢                                      H                    T                                    ⁢                  H                  ⁢                                                                          ⁢                                      c                    k                                                              }                                      ,                            (        16        )            where k* is an optimal FCB index parameter, that is, an FCB index parameter that maximizes the bracketed expression. By grouping terms that are not dependent on k, that is, by letting d2T=x2TH and Φ=HTH, Equation 16 can be simplified to:
                                          k            *                    =                                                    arg                ⁢                                                                  ⁢                max                            k                        ⁢                          {                                                                    (                                                                  d                        2                        T                                            ⁢                                              c                        k                                                              )                                    2                                                                      c                    k                    T                                    ⁢                  Φ                  ⁢                                                                          ⁢                                      c                    k                                                              }                                      ,                            (        17        )            in which the optimal FCB gain γ is given as:
                    γ        =                                                            d                2                T                            ⁢                              c                k                                                                    c                k                T                            ⁢              Φ              ⁢                                                          ⁢                              c                k                                              .                                    (        18        )            
The encoder 800 provides a method and apparatus for determining the optimal excitation vector-related parameters τ, β, k, and γ. Unfortunately, higher bit rate CELP coding typically requires higher computational complexity due to a larger number of codebook entries that require error evaluation in the closed loop processing. Thus, there is an opportunity for generating a candidate code-vector to reduce the computational complexity to code an information signal.