b 1. Field of the Invention
The present invention relates to speech processing. In particular, the present invention relates to an enhanced method for performing speech modeling and vector quantization in speech encoding applications.
2. Discussion of the Related Art
Linear Predictive Coding (LPC) techniques are widely used in speech encoding applications. In the prior art, to efficiently code LPC parameters into as few bits as possible, and to maintain a linear distortion performance over a wide range of values of LPC parameters, LPC parameters are sometimes represented in the frequency domain as line spectral frequencies (LPFs) using, for example, any of the methods disclosed in Chapter 4, entitled xe2x80x9cLPC PARAMETER QUANTIZATION USING LSFSxe2x80x9d, in Digital Speech Coding for Low Bit Rate Communication Systems by A. M. Kondoz, published by Wiley and Sons (1994) (xe2x80x9cDigital Speech Codingxe2x80x9d). The principle steps of one such method are illustrated by process 100 of FIG. 1. Under this method, at step 101, a set of coefficients is first estimated using linear prediction represented by a linear predictor model LP(n) of order l given by:       LP    ⁡          (      n      )        =            ∑              i        =        1            l        ⁢                  α        i            ⁢              s        ⁡                  (                      n            -            i                    )                    
where s(n) is value of the speech signal at time n, xcex1i is ith LPC coefficient such that the error e(n)=s(n)xe2x88x92LP(n) is minimized. In one instance, l is 10. Typically, in the encoding process, the LPC coefficients are extracted every update period, which can be a time period 20-30 milliseconds long.
Then, at step 102, from the xcex1i""s, two xc2xd-degree polynomials P(x) and Q(x) are constructed. Polynomials P(x) and Q(x) are given by the following:             P      ⁡              (        x        )              =                  ∑                  i          =          0                          l          /          2                    ⁢                        a          i                ⁢                  x          i                                Q      ⁡              (        x        )              =                  ∑                  i          =          0                          l          /          2                    ⁢                        b          i                ⁢                  x          i                    
The coefficients ai and bi are each a function of the LPC coefficients xcex1i. The l roots of polynomials P(x) and Q(x) are a set of values ki (1xe2x89xa7kixe2x89xa7xe2x88x921), in which the odd indices ki""s (i.e., i=1, 3, 5, . . . ) are roots of polynomial P(x) and the even indices ki""s (i.e., i=2, 4, 6. . . ) are roots of polynomial Q(x), and ordered such that ki greater than ki+1. and are typically grouped into xc2xd xe2x80x9cline spectral pairsxe2x80x9d (LSPs), each LSP consisting of a pair (ki, ki+1). FIG. 3 shows an example of a 5th order polynomial P (x) having roots k1, k3, k5, k7 and k9.
LSPs are, however, non-linear parameters, which are not suitable for efficient quantization. In particular, if linear quantization steps are used, requisite resolution may not attained over some range of values, and wasteful for unnecessary resolution over some other range of values. Thus, at step 103, the LSPs are transformed into the frequency domain by taking the arc-cosine (i.e., cosxe2x88x921 ki) of each root ki. The resulting values of the transformation are referred to as xe2x80x9cline spectral frequenciesxe2x80x9d (xe2x80x9cLSFsxe2x80x9d).
At step 104, the LSFs are then quantized. In one instance, the LSFs are xe2x80x9cvector quantizedxe2x80x9d by using the LSF values to search a xe2x80x9ccode bookxe2x80x9d for an index which represents the set of quantized LSF values. For example, the 2-vector (cosxe2x88x921 k1, cosxe2x88x921 k3) can be used to search a 2-dimensional table in the code book. If 6 bits are allocated to represent such a pair, the 2-dimensional table has 64 entry corresponding to 64 pairs of selected possible values for (cosxe2x88x921 k1, cosxe2x88x921 k3). In one implementation, the index of the entry (xi, xj) for which the mean squared error (xixe2x88x92cosxe2x88x921 k1)2+(xj xe2x88x92cosxe2x88x921 k3)2 is minimum is selected to represent the 2-vector (cosxe2x88x921k1, cosxe2x88x921 k3). Higher dimensional tables are possible for vector quantizing a larger number of LSF values. For example, at three bits per root, a 3-dimensional table searchable by a 3-vector (cosxe2x88x921 k1, cosxe2x88x921 k3, cosxe2x88x921 k5) has 9-bit indices, or 512 entries. Of course, for the same per-root bit allocation (e.g., 3 bits per root), the storage requirements grow exponentially with the number of dimensions. In communication or storage applications, for example, the indices are transmitted or saved. At a later time, speech is synthesized or reconstructed (e.g., at the receiver side, or when replaying from storage) using a process that is substantially the reverse of process 100 discussed above.
In the method described above, finding the l roots of polynomials P(x) and Q(x) at step 102 is typically performed using numerical methods (e.g., Newton""s method) which can be computationally intensive. In one method, each root ki is found by evaluating P(k) or Q(k) for the trial values k between xe2x88x921 and 1, at increments of 0.0005. Such a method requires substantial amount computation which is undesirable in real-time applications.
The present invention provides a linear predictive speech encoding method which combines the quantization step with the search of roots of line spectral pair (LSP) polynomials. In one embodiment, according to one embodiment of the present invention, an indexed table having as entries quantized line spectral pair (LSP) values is created from a table of quantized line spectral frequencies (LSFs). Under a method of the present invention, during each update period, a set of LPC coefficients is computed to derive LSP polynomials P(x) and Q(x). However, instead of finding the roots of the polynomials P(x) and Q(x), polynomials P(x) and Q(x) are evaluated using the quantized LSP values of the indexed table. The approximate roots of the polynomials P(x) and Q(x) are selected from the entry of the indexed table whose quantized LSP values give the least error when used to evaluate polynomials P(x) and Q(x). The index of the selected entry of the table can be used to representing the approximate roots in the speech encoding application.
In one embodiment, the method selects the approximate roots by selecting such quantized LSP values that provide a least mean squared error in evaluating polynomials P(x) and Q(x). Further, under one method of the present invention, a step is taken to ensure that each selected LSP value corresponds to a designated root of the polynomials P(x) and Q(x). In one instance, the ensuring step is achieved by examining the direction of change in value of polynomial P(x) when successively decreasing LSP values for x are substituted into polynomial P(x). In one implementation, each of said polynomials P(x) and Q(x) is 5th-order.
According to another aspect of the present invention, a code book used in conjunction with the present invention can be organized as a number of multi-dimensional tables each representing vectors of quantized LSP values corresponding to multiple roots of the LSP polynomials. In one embodiment, the entries of each table of LSP values are arranged in a decreasing order of proposed LSP values in a designated root of the LSP polynomials.
Under the present invention, during run time, complex operations for searching the roots of the polynomials are avoided. Further, because the code book is prepared from an LSF code book, the desirable linear distortion performance of quantization in the LSF domain is preserved.
The present invention is better understood upon consideration of the detailed description below and the accompanying drawings.