This invention relates to a vector excitation coder which efficiently compresses vectors of digital voice or audio for transmission or for storage, such as on magnetic tape or disc.
In recent developments of digital transmission of voice, it has become common practice to sample at 8 kHz and to group the samples into blocks of samples. Each block is commonlY referred to as a "vector" for a type of coding processing called Vector Excitation Coding (VXC). It is a powerful new technique for encoding analog speech or audio into a digital representation. Decoding and reconstruction of the original analog signal permits quality reproduction of the original signal.
Briefly, the prior art VXC is based on a new and general source-filter modeling technique in which the excitation signal for a speech production model is encoded at very low bit rates using vector quantization. Various architectures for speech coders which fall into this class have recently been shown to reproduce speech with very high perceptual quality.
In a generic VXC coder, a vocal-tract model is used in conjunction with a set of excitation vectors (codevectors) and a perceptually-based error criterion to synthesize natural-sounding speech. One example of such a coder is Code Excited Linear Prediction (CELP), which uses Gaussian random variables for the codevector components. M. R. Schroeder and B. S. Atal, "Code-Excited Linear Prediction (CELP): High-Quality Speech at Very Low Bit Rates," Proceedings Int'l. Conference on Acoustics, Speech, and Signal Processing, Tampa, March, 1985 and M. Copperi and D. Sereno, "CELP Coding for High-Quality Speech at 8 kbits/s," Proceedings Int'l. Conference on Acoustics, Speech, and Signal Processing, Tokyo, April, 1986. CELP achieves very high reconstructed speech quality, but at the cost of astronomic computational complexity (around 440 million multiply/add operations per second for real-time selection of the optimal codevector for each speech block).
In the present invention, VXC is employed with a sparse vector excitation to achieve the same high reconstructed speech quality as comparable schemes, but with significantly less computation. This new coder is denoted Pulse Vector Excitation Coding (PVXC). A variety of novel complexity reduction methods have been developed and combined, reducing optimal codevector selection computation to only 0.55 million multiply/adds per second, which is well within the capabilities of present data processors. This important characteristic makes the hardware implementation of a real-time PVXC coder possible using only one programmable digital signal processor chip, such as the AT&T DSP32. Implementation of similar speech coding algorithms using either programmable processors or high-speed, special-purpose devices is feasible but very impractical due to the large hardware complexity required.
Although PVXC of the present invention employs some characteristics of multipulse linear predictive coding (MPLPC) where excitation pulse amplitudes and locations are determined from the input speech, and some characteristics of CELP, where Gaussian excitation vectors are selected from a fixed codebook, there are several important differences between them. PVXC is distinguished from other excitation coders by the use of a precomputed and stored set of pulse-like (sparse) codevectors. This form of vocal-tract model excitation is used together with an efficient error minimization scheme in the Sparse Vector Fast Search (SVFS) and Enhanced SVFS complexity reduction methods. Finally, PVXC incorporates an excitation codebook which has been optimized to minimize the perceptually-weighted error between original and reconstructed speech waveforms. The optimization procedure is based on a centroid derivation. In addition, a complexity reduction scheme called Spectral Classification (SPC) is disclosed for excitation coders using a conventional codebook (fully-populated codevector components). There is currently a high demand for speech coding techniques which produce high-quality reconstructed speech at rates around 4.8 kb/s Such coders are needed to close the gap which exists between vocoders with an "electronic-accent" operating at 2.4 kb/s and newer, more sophisticated hybrid techniques which produce near toll-quality speech at 9.6 kb/s.
For real-time implementations, the promise of VXC has been thwarted somewhat by the associated high computational complexity. Recent research has shown that the dominant computation (excitation codebook search) can be reduced to around 40 M Flops without compromising speech quality However, this operation count is still too high to implement a practical real-time version using only a few current-generation DSP chips. The PVXC coder described herein produces natural-sounding speech at 4.8 kb/s and requires a total computation of only 1.2 M Flops.