1. Field of the Invention
The present invention generally relates to digital speech coding for efficient modeling, quantization, and error minimization of waveform signal components and speech prediction residual signals at low bit rates, and more particularly to improved methods for coding the excitation information for code-excited linear predictive speech coders.
2. Description of the Related Art
In low rate coding applications such as digital speech, linear predictive coding (LPC) or similar techniques are typically used to model the spectra of short term speech signals. Systems employing LPC techniques provide prediction residual signals for corrections to the short term model characteristics.
A speech coding technique known as code-excited linear prediction (CELP) produces high quality synthesized speech at low bit rates, i.e., 4.8 to 9.6 kilobits-per-second (kbps). This class of speech coding is also known as vector-excited linear prediction or stochastic coding, which is used in numerous speech communications and speech synthesis applications. CELP is also particularly applicable to digital speech encryption and digital radiotelephone communication systems wherein speech quality, data rate, size, and cost are significant issues.
The LPC system of a CELP speech coder typically employs long term (“pitch”) and short term (“formant”) predictors that model the characteristics of the input speech signal and are incorporated in a set of time-varying linear filters. An excitation signal for the filters is chosen from a codebook of stored innovation sequences, or codevectors. For each frame of speech, the speech coder applies each individual codevector to the filters to generate a reconstructed speech signal, and compares the original input speech signal to the reconstructed signal to create an error signal. The error signal is then weighted by passing it through a weighting filter having a response based on human auditory perception. The optimum excitation signal is determined by selecting the codevector that produces the weighted error signal with the minimum energy for the current frame. For each block of speech, a set of linear predictive coding (LPC) parameters are produced in accordance with prior art techniques. The short term predictor parameters STP, long term predictor parameters LTP, and excitation gain factor may be sent over the channel for use by the speech synthesizer. See, e.g., “Predictive Coding of Speech at Low Bit Rates,” IEEE Trans. Commun., Vol. COM-30, pp. 600-14, April 1982, by B. S. Atal, for representative methods of generating these parameters.
The stored excitation codevectors generally include independent random white Gaussian sequences. One codevector from the codebook is used to represent each block of N excitation samples. Each stored codevector is represented by a codeword, i.e., the address of the codevector memory location. It is this codeword that is subsequently sent over a communications channel to the speech synthesizer to reconstruct the speech frame at the receiver. See, M. R. Schroeder and B. S. Atal, “Code-Excited Linear Prediction (CELP): High-Quality Speech at Very Low Bit Rates”, Proceedings of the EEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 3, pp. 937-40, March 1985, for a detailed explanation of CELP.
The difficulty of the CELP speech coding technique lies in the extremely high computational complexity of performing an exhaustive search of all the excitation codevectors in the codebook. Moreover, the memory allocation requirement to store the codebook of independent random vectors is also exorbitant. For the above example, a 640 kilobit read-only-memory (ROM) would be required to store all 1024 codevectors, each having 40 samples, each sample represented by a 16-bit word. Thus, substantial computational efforts are required to search the entire codebook, e.g., 1024 vectors, for the best fit—an unreasonable task for real-time implementation with today's digital signal processing technology.
An alternative for reducing the computational complexity of this codevector search process is to implement the search calculations in a transform domain. Refer to I. M. Trancoso and B. S. Atal, “Efficient Procedures for Finding the Optimum Innovation in Stochastic Coders”, Proc. ICASSP, Vol. 4, pp. 2375-8, April 1986, as an example of such a procedure. Using this approach, discrete Fourier transforms (DFT's) or other transforms may be used to express the filter response in the transform domain such that the filter computations are reduced to a single multiply-accumulate operation (MAC) operation per sample per codevector.
Another alternative for reducing the computational complexity is to structure the excitation codebook such that the codevectors are no longer independent of each other. In this manner, the filtered version of a codevector can be computed from the filtered version of the previous codevector, again using only a single filter computation MAC per sample. Examples of these types of codebooks are given in the article entitled “Speech Coding Using Efficient Pseudo-Stochastic Block Codes”, Proc. ICASSP, Vol. 3, pp. 1354-7, April 1987, by D. Lin. Nevertheless, 24,000,000 MACs per second would still be required to do the search. Moreover, the ROM size is based on 2M×n bits/word, where M is the number of bits in the codeword such that the codebook contains 2M codevectors. Therefore, the memory requirements still increase exponentially with the number of bits used to encode the frame of excitation information. For example, the ROM requirements increase to 64 kilobits when using 12 bit codewords.
Another example of a structured excitation codebook is a Vector Sum Excited Prediction (VSELP) codebook, disclosed in U.S. Pat. No. 4,817,157 issued Mar. 28, 1989 to Ira Gerson for “Digital Speech Coder Having Improved Vector Excitation Source” assigned to applicant's assignee, and hereby incorporated by reference. According to one implementation of a VSELP excitation codebook, all 2M excitation codevectors may be generated as a linear combination of M basis vectors, where codeword I specifies the polarity of each of the M basis vectors in the linear combination. The entire codebook can be searched using only M+3 multiply-accumulate operations per codevector evaluation. Other advantages of a VSELP codebook are efficient codebook storage (only the M basis vectors need to be stored, instead of 2M codevectors), resilience to channel errors, and an ability to optimize the VSELP basis vectors utilizing an off line codebook training procedure.
Since the complexity of performing an exhaustive search of an excitation codebook is a function of the type of excitation codebook used and the value of M, one approach to managing the complexity of searching an excitation codebook is to limit the value of M. From coding efficiency perspective, however, it may be advantageous to make M large, because that would allow the speech coder designer the freedom to utilize a longer excitation codevector length and simultaneously lower the rate at which the gain factor for scaling the selected codevector needs to be encoded.
The Sparse Algebraic Codebook (SAC) of Jean-Pierre Adoul, University of Sherbrooke, offers one formulation of an excitation codebook that has the ability to be defined by a large number of bits (M). The Algebraic Codebook itself need not be stored. Instead a compact set of rules defines, for a given codeword, how a codevector is to be constructed, via a placement of unity amplitude pulses (+/−1) within the initially zero valued codevector. This set of rules is stored both at the encoder and at the decoder. Search complexity is typically kept managable by not searching the codebook exhaustively. While allowing reasonable search complexity, low codebook storage space, and utilization of long codevector lengths, the requirement that the excitation codevector be constructed from unity amplitude pulses, prevents use of an off line codebook training procedure from being applied to optimize the relative amplitudes of the samples in the excitation codebook.
A need, therefore, exists to provide an improved speech coding technique that addresses both the problems of extremely high computational complexity for codebook searching given large values of M, as well as the vast memory requirements for storing the excitation codevectors with solutions for making long codevector length codebook practical.