1. Field of Invention
The present invention generally relates to speech coding at low bit rates (in a range 2.4-4.8 kb/s). In particular, the present invention relates to improving excitation generating and linear predicting coefficient coding directed at the reduction of the number of data bits for coded speech.
2. Description of Related Art
Digital speech communication systems including voice storage and voice response facilities utilize signal compression to reduce the bit rate needed for storage and/or transmission. As it is well known in the art, a speech pattern contains redundancies that are not essential to its apparent quality. Removal of redundant components of the speech pattern significantly lowers the number of bits required to synthesize the speech signal. A goal of effective digital speech coding is to provide an acceptable subjective quality of synthesized speech at low bit rates. However, the coding must also be fast enough to allow for real time implementation.
One method used to partially achieve these goals is based on the standard Linear Prediction (LP) technique. The characteristic features of this technique are the following. The sampled and quantized speech signal is partitioned into successive intervals (frames), then a set of parameters representative of the interval speech is generated. The parameter set includes linear prediction coefficients (LPCs) which determine an LP filter, and the best excitation signal. The best LPCs and excitation are then used to produce a synthesized signal close to the original speech signal. This is done on a per frame basis.
The best excitation is typically found through a look-up in a table, or codebook. The codebook includes vectors whose components are consecutive excitation samples. Each vector contains the same number of excitation samples as there are speech samples in a frame.
One of the most effective approaches of this type is the Code Excited Linear Prediction (CELP) method which was disclosed in "Predictive Coding of Speech at Low Bit Rates", Atal B.S., IEEE Transactions on Communications, vol. COM-30, No. 4, (April, 1982), 600-614.
FIG. 1 illustrates how a CELP implementation generates the best excitation for an LP filter such that the output of the filter closely approximates input speech.
In each frame the input speech signal is pre-filtered by a fixed digital pre-filter 100. Next, the pre-filtered speech is processed by linear prediction analyzer 101 to estimate the linear predictive filter A(z) of a prescribed order. Each frame is broken into a predetermined number of subframes. This allows excitations to be generated for each subframe. Each speech vector, for a given subframe, is passed through the ringing removal and perceptual weighting module 102. The speech signal is perceptually predistorted by a linear filter with the transfer function W(z)=A(z)/A(.gamma.z) for some .gamma.. The output w, of module 102, is analyzed by the long-term prediction analyzer 103 to obtain a periodic (pitch) component p relating to the excitation. The best pitch excitation is found by searching the index (code word number) I.sub.A in an adaptive codebook (ACB) and computing the optimal gain factor g.sub.A. These jointly minimize the squared norm .vertline..vertline.d.vertline..vertline..sup.2 of the vector d=w-bg.sub.A, where b denotes the response of the synthesis filter 1/A(z.gamma.) 104 excited by p. For this purpose, an exhaustive search in an ACB is performed to find the maximal value of the match function: EQU M=(w,b).sup.2 /(b,b).
The optimal gain value is determined as follows: EQU g.sub.A =(w,b)/(b,b).
The residual vector u=w-b g.sub.A from the output of adder 105 enters the stochastic codebook analyzer 108. Here the best residual excitation index I.sub.S, and the optimal gain factor g.sub.s, are found. These jointly minimize the squared norm .vertline..vertline.d.vertline..vertline..sup.2 of the error vector d=u-rg.sub.s, where r denotes the response of the stochastic codebook analyzer 108's synthesis filter excited by the code word c, from the precomputed stochastic codebook 109. Using the multiplier 106, multiplier 110, and adder 107, we obtain the resulting excitation vector e for a given subframe as the following sum: EQU e=pg.sub.A +cg.sub.s.
For the CELP speech coding technique, the synthesized speech quality rapidly degrades as data rates are reduced. For example, at 4.8 kb/s, a 10-bit codebook is generally used. However, at 2.4 kb/s, the number of bits of the codebook must be decreased to 5. Since 5 bits are too small to cover many types of speech signals, the speech quality is abruptly degraded at a bit rate lower than 4.8 kb/s.
Various improvements of the CELP technique exist. These techniques attempt to provide acceptable speech compression at data rates below 4800 bps. Such techniques are reported in the following references:
Zinser R. L., Koch S. R. "CELP coding at 4.0 kb/sec and below: improvements to FS-1016." Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. I-313 through I-316, March 1992;
Wang S., Gersho A. "Improved phonetically-segmented vector excitation coding at 3.4 kb/s." Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. I-349 through I-352, March 1992;
J. Haagen, H. Nielsen, S. D. Hansen "Improvements in 2.4 kb/s high-quality speech coding." Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. II-145 through II-148, March 1992;
R. L. Zinser "Hybrid switched multi-pulse/stochastic speech coding technique." U.S. Pat. No. 5,060,269;
Z. Xiongwei and Chen Xianzhi "A new excitation model for LPC vocoder at 2.4 Kb/s." Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. I-65 through I-68, March 1992;
Federal Standard 1016, "Telecommunications: Analog to Digital Conversion of radio voice 4,800 bit second Code Excited Linear Prediction (CELP)." February, 1991.
These CELP-based systems reduce the bit rate by: 1) reducing the number of bits for excitation coding by using more simple excitations than in CELP; or 2) reducing the number of bits for LPC coding by more complicated vector quantization, with a corresponding loss in the subjective quality.
Use of the excitation classes other than CELP, and requiring the reduced number of bits, were investigated, for example, in "On reducing the bit rate of a CELP-based speech coder", Y. J. Liu, Proceeding of 1992 International Conference on Acoustics, Speech and Signal Processing, pp. I-49 through I-52, March 1992. It was shown there that the signal-to-noise ratio (SNR) for the half-rate CELP-based system is lower by 3-4 dB in comparison with the SNR of the Federal 4800 bps CELP Standard.
To decrease the number of bits for LPC coding, a number of methods were proposed in prior art, as for example in U.S. Pat. Nos. 5,255,339, 5,233,659. The most effective approaches of this type are split-vector quantization, disclosed in "Efficient Vector Quantization of LPC Parameters at 24 bits/frame," K. K. Paliwal and B. S. Atal, Proceedings of the 1991 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 661-664, May 1991, and the finite-state vector quantization, was described in "Finite-state Vector Quantization over Noisy Channels and its Application to LSP Parameters", Y. Hussain and N. Farvardin, Proceedings of the 1992 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. II-133 through II-136, March 1992. For these processes, 24-26 bits/frame are needed for quantization with a quality close to that in CELP. However, a further decrease in the number of bits leads to a loss in the quality. Also, these quantization schemes are much more complicated in comparison with the 34 bits scalar quantizer in CELP Standard.
An effective speech compression at rates in a range 2.4 through 4.8 kb/s, with an acceptable quality of synthesized speech, and a practical real time implementation still remains as a key problem.
An improved method and apparatus for compressing speech is desired.