1. Field of the Invention
The present invention relates to code-excited linear prediction (CELP) speech processing. Specifically, the present invention relates to translating digital speech packets from one CELP format to another CELP format.
2. Related Art
Transmission of voice by digital techniques has become widespread, particularly in long distance and digital radio telephone applications. This, in turn, has created interest in determining the least amount of information which can be sent over the channel while maintaining the perceived quality of the reconstructed speech. If speech is transmitted by simply sampling and digitizing, a data rate on the order of 64 kilobits per second (kbps) is required to achieve a speech quality of a conventional analog telephone. However, through the use of speech analysis, followed by the appropriate coding, transmission, and resynthesis at the receiver, a significant reduction in the data rate can be achieved.
Devices which employ techniques to compress voiced speech by extracting parameters that relate to a model of human speech generation are typically called vocoders. Such devices are composed of an encoder, which analyzes the incoming speech to extract the relevant parameters, and a decoder, which resynthesizes the speech using the parameters which it receives over a channel, such as a transmission channel. The speech is divided into blocks of time, or analysis subframes, during which the parameters are calculated. The parameters are then updated for each new subframe.
Linear-prediction-based time domain coders are by far the most popular type of speech coder in use today. These techniques extract the correlation from the input speech samples over a number of past samples and encode only the uncorrelated part of the signal. The basic linear predictive filter used in this technique predicts the current sample as a linear combination of the past samples. An example of a coding algorithm of this particular class is described in the paper "A 4.8 kbps Code Excited Linear Predictive Coder" by Thomas E. Tremain et al., Proceedings of the Mobile Satellite Conference, 1988.
The function of the vocoder is to compress the digitized speech signal into a low bit rate signal by removing all of the natural redundancies inherent in speech. Speech typically has short term redundancies due primarily to the filtering operation of the lips and tongue, and long term redundancies due to the vibration of the vocal cords. In a CELP coder, these operations are modeled by two filters, a short-term formant filter and a long-term pitch filter. Once these redundancies are removed, the resulting residual signal can be modeled as white gaussian noise, which is also encoded.
The basis of this technique is to compute the parameters of two digital filters. One filter, called the formant filter (also known as the "LPC (linear prediction coefficients) filter"), performs short-term prediction of the speech waveform. The other filter, called the pitch filter, performs long-term prediction of the speech waveform. Finally, these filters must be excited, and this is done by determining which one of a number of random excitation waveforms in a codebook results in the closest approximation to the original speech when the waveform excites the two filters mentioned above. Thus the transmitted parameters relate to three items (1) the LPC filter, (2) the pitch filter and (3) the codebook excitation.
Digital speech coding can be broken in two parts; encoding and decoding, sometimes known as analysis and synthesis. FIG. 1 is a block diagram of a system 100 for digitally encoding, transmitting and decoding speech. The system includes a coder 102, a channel 104, and a decoder 106. Channel 104 can be a communications channel, storage medium, or the like. Coder 102 receives digitized input speech, extracts the parameters describing the features of the speech, and quantizes these parameters into a source bit stream that is sent to channel 104. Decoder 106 receives the bit stream from channel 104 and reconstructs the output speech waveform using the quantized features in the received bit stream.
Many different formats of CELP coding are in use today. In order to successfully decode a CELP-coded speech signal, the decoder 106 must employ the same CELP coding model (also referred to as "format") as the encoder 102 that produced the signal. When communications systems employing different CELP formats must share speech data, it is often desirable to convert the speech signal from one CELP coding format to another.
One conventional approach to this conversion is known as "tandem coding." FIG. 2 is a block diagram of a tandem coding system 200 for converting from an input CELP format to an output CELP format. The system includes an input CELP format decoder 206 and an output CELP format encoder 202. Input format CELP decoder 206 receives a speech signal (referred to hereinafter as the "input" signal) that has been encoded using one CELP format (referred to hereinafter as the "input" format). Decoder 206 decodes the input signal to produce a speech signal. Output CELP format encoder 202 receives the decoded speech signal and encodes it using the output CELP format (referred to hereinafter as the "output" format) to produce an output signal in the output format. The primary disadvantage of this approach is the perceptual degradation experienced by the speech signal in passing through multiple encoders and decoders.