A major objective in designing digital speech coders is to optimize tradeoffs between minimizing the bit rate of the encoded speech and maximizing the speech quality. Other practical criteria, such as complexity, delay and robustness, also impose constraints on coder design. Optimization of the tradeoffs must be tailored to the particular application to which the coder is to be applied.
Waveform approximating coders and decoders rely on relatively simple speech models and on limitations of the human hearing system to encode and reconstruct waveforms which are perceived to be very similar to the original speech signal prior to encoding. Over the past decade, the performance of Generalized Linear Prediction Analysis-by-Synthesis (GLPAS) speech coders providing coded speech at 2 kbps to 16 kbps has improved considerably. Nevertheless, further effort is devoted to increasing the speech quality of such coders and or the reduction of bit rate for equivalent speech quality.
A GLPAS coder commonly operates on successive frames of a speech signal in a closed-loop fashion, each frame comprising a plurality of successive subframes. Processing at the subframe level provides better modelling of signal changes while meeting practical constraints on processing complexity and memory usage, and the closed-loop nature of the processing further improves the efficiency of the coding.
Typical GLPAS coding techniques comprise:
Linear Predictive Coding (LPC) analysis to model the spectral envelope of the speech signal, providing partial short term prediction of speech signal parameters; PA1 Pitch Delay prediction or Adaptive CodeBook (ACB) alignment to model pitch harmonics of the speech signal; PA1 Pitch or ACB Gain determination to model the energy of harmonic components of the speech signal; PA1 Fixed CodeBook (FCB) alignment to model excitation parameters of the speech signal; PA1 FCB Gain determination to model the energy of wide spectrum components of the speech signal; and PA1 pre- and post-processing of the speech signal.
GLPAS techniques provide better solutions than LPAS techniques to efficient coding of the pitch by modifying the input signal to allow infrequent pitch updates without degrading performance. This speech signal modification may then be considered part of pre-processing with the modified signal being the input to the modelling and quantization process. In this specification, LPAS is considered to be a special case of GLPAS in which the modification of the signal to simplify pitch encoding is omitted.
One example of a GLPAS coder is the "North American Enhanced Variable Rate Codec" specified by Standard IS-127. This codec uses 20 msec frames, each frame comprising 3 successive subframes. The bit budget for each 20 msec frame when this coded is operating in "half rate mode" allows 22 bits per frame for Line Spectral Pairs (LSP) derived by LPC analysis, 7 bits per frame for Pitch Delay or ACB index, 3 bits per subframe (i.e. 9 bits per frame) for ACB Gain, 10 bits per subframe (i.e. 30 bits per frame) for FCB index, and 4 bits per subframe (i.e. 12 bits per frame) for FCB Gain, for a total of 80 bits per frame. The Pitch Gain or ACB Gain is determined for each subframe and converted into a 3 bit code for each subframe using scalar quantization. The FCB gain is also determined for each subframe and converted into a 4 bit code for each subframe using scalar quantization.
An example of a recent LPAS coder is the "Enhanced Full Rate Speech Codec for North American Cellular" defined by Standard IS-641. This codec uses 20 msec frames, each frame comprising 4 successive subframes. The bit budget for each 20 msec frame allows 26 bits per frame for Line Spectral Pairs (LSP) derived by LPC analysis, 26 bits per frame for Pitch Delay or ACB index, 17 bits per subframe (i.e. 68 bits per frame) for FCB index, and 7 bits per subframe (i.e. 28 bits per frame) for FCB and Pitch or ACB Gain, for a total of 148 bits per frame. The 26 bits per frame for Pitch Delay or ACB index are provided as 8 bits for each of the first and third subframes of each frame, and 5 bits for each of the second and fourth subframes of each frame. The Pitch Gain or ACB Gain for each subframe and the FCB gain for each subframe are determined for each subframe and converted into a 7 bit code for each subframe using two dimensional vector quantization, one component of the two dimensional gain vector for each subframe corresponding to the pitch gain for the subframe and the other component of the gain vector for each subframe corresponding to the FCB gain for the subframe.
The coders defined by IS-127 and IS-641 represent recent standards in GLPAS and LPAS speech coding techniques.