1. Field of the Invention
The present invention generally relates to speech coding at low bit rates, and more particularly, is directed to an improved technique for storing and searching the excitation code book of linear predictive speech coders.
2. Description of the Related Art
A goal of effective digital speech coding is to provide an acceptable quality of synthesized speech at low bit rates. The coding must also be fast enough to allow for real time implementation. These goals are achieved by methods based on the standard Linear Prediction (LP) technique. The characteristic features of these methods are described below.
The sampled and quantized speech signal is separated on frames and a LP (Linear Predicting) filter is constructed for each frame by conventional techniques. For each frame, the best excitation is determined, which being applied to the input of the LP filter, produces a synthesized signal close to the original speech signal on the frame. The best excitation is typically found through a look-up in a code book. One of the most effective approaches of this type is the Code Excited Linear Prediction (CELP) method which was disclosed in "Predictive Coding of Speech at Low Bit Rates", Atal, B.S., IEEE Transactions on Communications, vol. COM-30, No. 4, (April 1982), 600-614.
The CELP speech encoding method provides high quality digital speech compression at low bit rates at the cost of extremely high complexity of the excitation search procedure. FIG. 1 illustrates how the best excitation for an LP filter such that the output of the filter closely approximates input speech is found in CELP.
In each frame the input speech signal is processed to estimate the linear predictive filter A(z) of a prescribed order. In order to find the excitation the frame is divided into several subframes (speech vectors) of length L. Each speech vector is perceptually predistorted by passing through the linear filter 100 with the transfer function W(z)=A(z)/A (.gamma.Z) for some .gamma., where 0.8&lt;.gamma.&lt;1. The predistortion is known to be useful in improving the synthesized speech quality. The perceptually predistorted input speech vector u is approximated by the response b.sub.j of the linear system comprising a decoder synthesis filter 1/A(.gamma.z) (called a short-term predictor) 104, a linear filter 103 called a long term predictor, and a multiplier 105 by the gain g.sub.j which is excited by the code word c.sub.j taken from the initially stored code book 102. In the CELP analysis method the best excitation for each subframe is found by searching the code word c.sub.j and computing a gain factor g.sub.j which jointly minimize the squared norm .parallel.d.sub.j .parallel..sup.2 of the error vector d.sub.j =u--b.sub.j g.sub.j : EQU .parallel.d.sub.j .parallel..sup.2 =(d.sub.j,d.sub.j)=d.sup.2.sub.j1 +. . .+d.sup.2.sub.jn,
obtained from the output of subtracter 101. For this purpose an exhaustive search in a code book is performed to find the maximal value of the match function EQU M.sub.j =(u,b.sub.j).sup.2 /(b.sub.j,b.sub.j). (equation 1)
The optimal gain value for code word c.sub.j is thereby computed as EQU gj=(u,b.sub.j)/(b.sub.j,b.sub.j). (equation 2)
In the search process each word from the code book is filtered by the decoder synthesis filter and the energy (b.sub.j,b.sub.j) and correlation (u, b.sub.j) values from equations (1) and (2) should be computed. Moreover, a large code book is used in order to achieve high speech quality. Therefore, the code book search in CELP is an extremely time consuming process.
For the CELP method there exist various techniques of reducing computation complexity. Such techniques were reported in the following references:
Davidson, G., and Gersho, A., "Complexity Reduction Methods for Vector Excitation Coding", IEEE-IECEI-ASJ International Conference on Acoustics, Speech and Signal Processing, vol. 4, (April 7-11, 1986), pp. 3055-3058;
P. Kroon, B. Atal, "On Improving the Performance of Pitch Predictors in Speech Coding Systems", Abstracts of the IEEE Workshop on Speech Coding for Telecommunications, 1989, P.49-50;
J. P. Campbell, T. E. Tremain, V. C. Welch, "The DOD 4.8 kbps Standard (Proposed Federal Standard 1016)", Advances in Speech Coding, Ch.4.1, Kluwer Academic Publishers, 1990. B. Atal, V. Cuperman, A. Gersho--Editors.
Federal Standard 1016, Telecommunications: Analog to Digital Conversion of radio voice by 4,800 bit/second Code Excited Linear Prediction (CELP). February, 1991.
Despite the foregoing prior techniques, the problem of reducing the time for the code book search and the effective size of the code book remain the most important factors for a real time implementation. In U.S. Pat. No. 4,817,157 Gerson a "vector sum" code book is described. The "vector sum" code book generation approach is a faster implementation of the code book search, but still requires approximately 2,600,000 multiply-accumulate (MAC) operations per second. This value does make possible a practical real time implementation using a single Digital Signal Processor (DSP).
A second concern is the storage requirements for the code book. The size of the code book is the product of the number of code words and the number of samples per code word.
The typical code book size is V.sub.s =1024 code words of length L=40 samples. In U.S. Pat. No. 4,817,157 a code book storing system based on keeping log.sub.2 V.sub.s basis vectors of length L is proposed. Such a "vector sum" system requires L*log.sub.2 V.sub.s =40*10=400 ternary (+1, -1, 0) memory cells and is useful for search simplification.
The reduction of storage requirements and complexity for code excited linear prediction systems remains a key problem in practical implementation of digital speech coding. The principal object of the present invention is to provide a high quality speech coding at data rates of approximately 4800-9600 bit per second, that satisfies time and memory requirements of a realtime hardware implementation.