The field of the present invention relates generally to the encoding and decoding of speech in voice communication systems and, more particularly to a method and apparatus for handling erroneous or lost frames.
To model basic speech sounds, speech signals are sampled over time and stored in frames as a discrete waveform to be digitally processed. However, in order to increase the efficient use of the communication bandwidth for speech, speech is coded before being transmitted especially when speech is intended to be transmitted under limited bandwidth constraints. Numerous algorithms have been proposed for the various aspects of speech coding. For example, an analysis-by-synthesis coding approach may be performed on a speech signal. In coding speech, the speech coding algorithm tries to represent characteristics of the speech signal in a manner which requires less bandwidth. For example, the speech coding algorithm seeks to remove redundancies in the speech signal. A first step is to remove short-term correlations. One type of signal coding technique is linear predictive coding (LPC). In using a LPC approach, the speech signal value at any particular time is modeled as a linear function of previous values. By using a LPC approach, short-term correlations can be reduced and efficient speech signal representations can be determined by estimating and applying certain prediction parameters to represent the signal. The LPC spectrum, which is an envelope of short term correlations in the speech signal, may be represented, for example, by LSF's (line spectral frequencies). After the removal of short-term correlations in a speech signal, a LPC residual signal remains. This residual signal contains periodicity information that needs to be modeled. The second step in removing redundancies in speech is to model the periodicity information. Periodicity information may be modeled by using pitch prediction. Certain portions of speech have periodicity while other portions do not. For example, the sound “aah” has periodicity information while the sound “shhh” has no periodicity information.
In applying the LPC technique, a conventional source encoder operates on speech signals to extract modeling and parameter information to be coded for communication to a conventional source decoder via a communication channel. One way to code modeling and parameter information into a smaller amount of information is to use quantization. Quantization of a parameter involves selecting the closest entry in a table or codebook to represent the parameter. Thus, for example, a parameter of 0.125 may be represented by 0.1 if the codebook contains 0, 0.1, 0.2, 0.3, etc. Quantization includes scalar quantization and vector quantization. In scalar quantization, one selects the entry in the table or codebook that is the closest approximation to the parameter, as described above. By contrast, vector quantization combines two or more parameters and selects the entry in the table or codebook which is closest to the combined parameters. For example, vector quantization may select the entry in the codebook that is the closest to the difference between the parameters. A codebook used to vector quantize two parameters at once is often referred to as a two-dimensional codebook. A n-dimensional codebook quantizes n parameters at once.
Quantized parameters may be packaged into packets of data which are transmitted from the encoder to the decoder. In other words, once coded, the parameters representing the input speech signal are transmitted to a transceiver. Thus, for example, the LSF's may be quantized and the index into a codebook may be converted into bits and transmitted from the encoder to the decoder. Depending on the embodiment, each packet may represent a portion of a frame of the speech signal, a frame of speech, or more than a frame of speech. At the transceiver, a decoder receives the coded information. Because the decoder is configured to know the manner in which speech signals are encoded, the decoder decodes the coded information to reconstruct a signal for playback that sounds to the human ear like the original speech. However, it may be inevitable that at least one packet of data is lost during transmission and the decoder does not receive all of the information sent by the encoder. For instance, when speech is being transmitted from a cell phone to another cell phone, data may be lost when reception is poor or noisy. Therefore, transmitting the coded modeling and parameter information to the decoder requires a way for the decoder to correct or adjust for lost packets of data. While the prior art describes certain ways of adjusting for lost packets of data such as by extrapolation to try to guess what the information was in the lost packet, these methods are limited such that improved methods are needed.
Besides LSF information, other parameters transmitted to the decoder may be lost. In CELP (Code Excited Linear Prediction) speech coding, for example, there are two types of gain which are also quantized and transmitted to the decoder. The first type of gain is the pitch gain GO, also known as the adaptive codebook gain. The adaptive codebook gain is sometimes referred to, including herein, with the subscript “a” instead of the subscript “p”. The second type of gain is the fixed codebook gain GC. Speech coding algorithms have quantized parameters including the adaptive codebook gain and the fixed codebook gain. Other parameters may, for example, include pitch lags which represent the periodicity of voiced speech. If the speech encoder classifies speech signals, the classification information about the speech signal may also be transmitted to the decoder. For an improved speech encoder/decoder that classifies speech and operates in different modes, see U.S. patent application Ser. No. 09/574,396 titled “A New Speech Gain Quantization Strategy,” filed May 19, 2000, which was previously incorporated herein by reference.
Because these and other parameter information are sent over imperfect transmission means to the decoder, some of these parameters are lost or never received by the decoder. For speech communication systems that transmit a packet of information per frame of speech, a lost packet results in a lost frame of information. In order to reconstruct or estimate the lost information, prior art systems have tried different approaches, depending on the parameter lost. Some approaches simply use the parameter from the previous frame that actually was received by the decoder. These prior art approaches have their disadvantages, inaccuracies and problems. Thus, there is a need for an improved way to correct or adjust for lost information so as to recreate a speech signal as close as possible to the original speech signal.
Certain prior art speech communication systems do not transmit a fixed codebook excitation from the encoder to the decoder in order to save bandwidth. Instead, these systems have a local Gaussian time series generator that uses an initial fixed seed to generate a random excitation value and then updates that seed every time the system encounters a frame containing silence or background noise. Thus, the seed changes for every noise frame. Because the encoder and decoder have the same Gaussian time series generator that uses the same seeds in the same sequence, they generate the same random excitation value for noise frames. However, if a noise frame is lost and not received by the decoder, the encoder and decoder use different seeds for the same noise frame, thereby losing their synchronicity. Thus, there is a need for a speech communication system that does not transmit fixed codebook excitation values to the decoder, but which maintains synchronicity between the encoder and decoder when a frame is lost during transmission.