Demand for efficient digital narrow- and wideband speech coding techniques with a good trade-off between the subjective quality and bit rate is increasing in various application areas such as teleconferencing, multimedia, and wireless communications. Until recently, the telephone bandwidth constrained into a range of 200-3400 Hz has mainly been used in speech coding applications. However, wideband speech applications provide increased intelligibility and naturalness in communication compared to the conventional telephone bandwidth. A bandwidth in the range 50-7000 Hz has been found sufficient for delivering a good quality giving an impression of face-to-face communication. For general audio signals, this bandwidth gives an acceptable subjective quality, but is still lower than the quality of FM radio or CD that operate in ranges of 20-16000 Hz and 20-20000 Hz, respectively.
A speech encoder converts a speech signal into a digital bit stream which is transmitted over a communication channel or stored in a storage medium. The speech signal is digitized, that is sampled and quantized with usually 16-bits per sample. The speech encoder has the role of representing these digital samples with a smaller number of bits while maintaining a good subjective speech quality. The speech decoder or synthesizer operates on the transmitted or stored bit stream and converts it back to a sound signal.
Code-Excited Linear Prediction (CELP) coding is one of the best techniques for achieving a good compromise between the subjective quality and bit rate. This coding technique is a basis of several speech coding standards both in wireless and wire line applications. In CELP coding, the sampled speech signal is processed in successive blocks of N samples usually called frames, where N is a predetermined number corresponding typically to 10-30 ms. A linear prediction (LP) filter is computed and transmitted every frame. The computation of the LP filter typically needs a look ahead, i.e. a 5-10 ms speech segment from the subsequent frame. The N-sample frame is divided into smaller blocks called subframes. Usually the number of subframes is three or four resulting in 4-10 ms subframes. In each subframe, an excitation signal is usually obtained from two components: a past excitation and an innovative, fixed-codebook excitation. The component formed from the past excitation is often referred to as the adaptive codebook or pitch excitation. The parameters characterizing the excitation signal are coded and transmitted to the decoder, where the reconstructed excitation signal is used as the input of the LP filter.
In conventional CELP coding, long term prediction for mapping the past excitation to the present is usually performed on a subframe basis. Long term prediction is characterized by a delay parameter and a pitch gain that are usually computed, coded and transmitted to the decoder for every subframe. At low bit rates, these parameters consume a substantial proportion of the available bit budget. Signal modification techniques [1-7]    [1] W. B. Kleijn, P. Kroon, and D. Nahumi, “The RCELP speech-coding algorithm,” European Transactions on Telecommunications, Vol. 4, No. 5, pp. 573-582, 1994.    [2] W. B. Kleijn, R. P. Ramachandran, and P. Kroon, “Interpolation of the pitch-predictor parameters in analysis-by-synthesis speech coders,” IEEE Transactions on Speech and Audio Processing, Vol. 2, No. 1, pp. 42-54, 1994.    [3] Y. Gao, A. Benyassine, J. Thyssen, H. Su, and E. Shlomot, “EX-CELP: A speech coding paradigm,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, Utah, U.S.A., pp. 689-692, 7-11 May 2001.    [4] U.S. Pat. No. 5,704,003, “RCELP coder,” Lucent Technologies Inc., (W. B. Kleijn and D. Nahumi), Filing Date: 19 Sep. 1995.    [5] European Patent Application 0 602 826 A2, “Time shifting for analysis-by-synthesis coding,” AT&T Corp., (B. Kleijn), Filing Date: 1 Dec. 1993.    [6] Patent Application WO 00/11653, “Speech encoder with continuous warping combined with long term prediction,” Conexant Systems Inc., (Y. Gao), Filing Date: 24 Aug. 1999.    [7] Patent Application WO 00/11654, “Speech encoder adaptively applying pitch preprocessing with continuous warping,” Conexant Systems Inc., (H. Su and Y. Gao), Filing Date: 24 Aug. 1999.improve the performance of long term prediction at low bit rates by adjusting the signal to be coded. This is done by adapting the evolution of the pitch cycles in the speech signal to fit the long term prediction delay, enabling to transmit only one delay parameter per frame. Signal modification is based on the premise that it is possible to render the difference between the modified speech signal and the original speech signal inaudible. The CELP coders utilizing signal modification are often referred to as generalized analysis-by-synthesis or relaxed CELP (RCELP) coders.
Signal modification techniques adjust the pitch of the signal to a predetermined delay contour. Long term prediction then maps the past excitation signal to the present subframe using this delay contour and scaling by a gain parameter. The delay contour is obtained straightforwardly by interpolating between two open-loop pitch estimates, the first obtained in the previous frame and the second in the current frame. Interpolation gives a delay value for every time instant of the frame. After the delay contour is available, the pitch in the subframe to be coded currently is adjusted to follow this artificial contour by warping, i.e. changing the time scale of the signal.
In discontinuous warping [1, 4 and 5]    [1] W. B. Kleijn, P. Kroon, and D. Nahumi, “The RCELP speech-coding algorithm,” European Transactions on Telecommunications, Vol. 4, No. 5, pp. 573-582, 1994.    [4] U.S. Pat. No. 5,704,003, “RCELP coder,” Lucent Technologies Inc., (W. B. Kleijn and D. Nahumi), Filing Date: 19 Sep. 1995.    [5] European Patent Application 0 602 826 A2, “Time shifting for analysis-by-synthesis coding,” AT&T Corp., (B. Kleijn), Filing Date: 1 Dec. 1993.a signal segment is shifted in time without altering the segment length. Discontinuous warping requires a procedure for handling the resulting overlapping or missing signal portions. Continuous warping [2, 3, 6, 7]    [2] W. B. Kleijn, R. P. Ramachandran, and P. Kroon, “Interpolation of the pitch-predictor parameters in analysis-by-synthesis speech coders,” IEEE Transactions on Speech and Audio Processing, Vol. 2, No. 1, pp. 42-54, 1994.    [3] Y. Gao, A. Benyassine, J. Thyssen, H. Su, and E. Shlomot, “EX-CELP: A speech coding paradigm,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, Utah, U.S.A., pp. 689-692, 7-11 May 2001.    [6] Patent Application WO 00/11653, “Speech encoder with continuous warping combined with long term prediction,” Conexant Systems Inc., (Y. Gao), Filing Date: 24 Aug. 1999.    [7] Patent Application WO 00/11654, “Speech encoder adaptively applying pitch preprocessing with continuous warping,” Conexant Systems Inc., (H. Su and Y. Gao), Filing Date 24 Aug. 1999.either contracts or expands a signal segment. This is done using a time continuous approximation for the signal segment and re-sampling it to a desired length with unequal sampling intervals determined based on the delay contour. For reducing artifacts in these operations, the tolerated change in the time scale is kept small. Moreover, warping is typically done using the LP residual signal or the weighted speech signal to reduce the resulting distortions. The use of these signals instead of the speech signal also facilitates detection of pitch pulses and low-power regions in between them, and thus the determination of the signal segments for warping. The actual modified speech signal is generated by inverse filtering.
After the signal modification is done for the current subframe, the coding can proceed in any conventional manner except the adaptive codebook excitation is generated using the predetermined delay contour. Essentially the same signal modification techniques can be used both in narrow- and wideband CELP coding.
Signal modification techniques can also be applied in other types of speech coding methods such as waveform interpolation coding and sinusoidal coding for instance in accordance with [8].    [8] U.S. Pat. No. 6,223,151, “Method and apparatus for pre-processing speech signals prior to coding by transform-based speech coders,” Telefon Aktie Bolaget L M Ericsson, (W. B. Kleijn and T. Eriksson), Filing Date 10 Feb. 1999.