The present invention relates to the improved method and system for digital encoding of speech signals, more particularly to Linear Predictive Analysis-by-Synthesis (LPAS) based speech coding.
LPAS coders have given new dimension to medium-bit rate (8-16 Kbps) and low-bit rate (2-8 Kbps) speech coding research. Various forms of LPAS coders are being used in applications like secure telephones, cellular phones, answering machines, voice mail, digital memo recorders, etc. The reason is that LPAS coders exhibit good speech quality at low bit rates. LPAS coders are based on a speech production model 39 (illustrated in FIG. 1) and fall into a category between waveform coders and parametric coders (Vocoder); hence they are referred to as hybrid coders.
Referring to FIG. 1, the speech production model 39 parallels basic human speech activity and starts with the excitation source 41 (i.e., the breathing of air in the lungs). Next the working amount of air is vibrated through a vocal chord 43. Lastly, the resulting pulsed vibrations travel through the vocal tract 45 (from vocal chords to voice box) and produce audible sound waves, i.e., speech 47.
Correspondingly, there are three major components in LPAS coders. These are (i) a short-term synthesis filter 49, (ii) a long-term synthesis filter 51, and (iii) an excitation codebook 53. The short-term synthesis filter includes a short-term predictor in its feed-back loop. The short-term synthesis filter 49 models the short-term spectrum of a subject speech signal at the vocal tract stage 45. The short-term predictor of 49 is used for removing the near-sample redundancies (due to the resonance produced by the vocal tract 45) from the speech signal. The long-term synthesis filter 51 employs an adaptive codebook 55 or pitch predictor in its feedback loop. The pitch predictor 55 is used for removing far-sample redundancies (due to pitch periodicity produced by a vibrating vocal chord 43) in the speech signal. The source excitation 41 is modeled by a so-called xe2x80x9cfixed codebookxe2x80x9d (the excitation code book) 53.
In turn, the parameter set of a conventional LPAS based coder consists of short-term parameters (short-term predictor), long-term parameters and fixed codebook 53 parameters. Typically short-term parameters are estimated using standard 10-12th order LPC (Linear predictive coding) analysis.
The foregoing parameter sets are encoded into a bit-stream for transmission or storage. Usually, short-term parameters are updated on a frame-by-frame basis (every 20-30 msec or 160-240 samples) and long-term and fixed codebook parameters are updated on a subframe basis (every 5-7.5 msec or 40-60 samples). Ultimately, a decoder (not shown) receives the encoded parameter sets, appropriately decodes them and digitally reproduces the subject speech signal (audible speech) 47.
Most of the state-of-the art LPAS coders differ in fixed codebook 53 implementation and pitch predictor or adaptive codebook implementation 55. Examples of LPAS coders are Code Excited Linear Predictive (CELP) coder, Multi-Pulse Excited Linear Predictive (MPLPC) coder, Regular Pulse Linear Predictive (RPLPC) coder, Algebraic CELP (ACELP) coder, etc. Further, the parameters of the pitch predictor or adaptive codebook 55 and fixed codebook 53 are typically optimized in a closed-loop using an analysis-by-synthesis method with perceptually-weighted minimum (mean squared) error criterion. See Manfred R. Schroeder and B. S. Atal, xe2x80x9cCode-Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates,xe2x80x9d IEEE Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Tampa, Fla., pp. 937-940, 1985.
The major attributes of speech-coders are:
1. Speech Quality
2. Bit-rate
3. Time and Space complexity
4. Delay
Due to the closed-loop parameter optimization of the pitch-predictor 55 and fixed codebook 53, the complexity of the LPAS coder is enormously high as compared to a waveform coder. The LPAS coder produces considerably good speech quality around 8-16 kbps. Further improvement in the speech quality of LPAS based coders can be obtained by using sophisticated algorithms, one of which is the multi-tap pitch predictor (MTPP). Increasing the number of taps in the pitch predictor increases the prediction gain, hence improving the coding efficiency. On the other hand, estimating and quantizing MTPP parameters increases the computational complexity and memory requirements of the coder.
Another very computationally expensive algorithm in an LPAS based coder is the fixed codebook search. This is due to the analysis-by-synthesis based parameter optimization procedure.
Today, speech coders are often implemented on Digital Signal Processors (DSP). The cost of a DSP is governed by the utilization of processor resources (MIPS/RAM/ROM) required by the speech coder.
One object of the present invention is to provide a method for reducing the computational complexity and memory requirements (MIPS/RAM/ROM) of an LPAS coder while maintaining the speech quality. This reduction in complexity allows a high quality LPAS coder to run in real-time on an inexpensive general purpose fixed point DSP or other similar digital processor.
Accordingly, the present invention method provides (i) an LPAS speech encoder reduced in computational complexity and memory requirements, and (ii) a method for reducing the computational complexity and memory requirements of an LPAS speech encoder, and in particular a multi-tap pitch predictor and the source excitation codebook in such an encoder. The invention employs fast structured product code vector quantization (PCVQ) for quantizing the parameters of the multi-tap pitch predictor within the analysis-by-synthesis search loop. The present invention also provides a fast procedure for searching the best code-vector in the fixed-code book. To achieve this, the fixed codebook is preferably formed of ternary values (1,xe2x88x921,0).
In a preferred embodiment, the multi-tap pitch predictor has a first vector codebook and a second (or more) vector codebook. The invention method sequentially searches the first and second vector codebooks.
Further, the invention includes forming the source excitation codebook by using non-contiguous positions for each pulse.