The field of the present invention relates generally to the encoding and decoding of speech in voice communication systems and, more particularly to a method and apparatus for handling erroneous or lost frames.
To model basic speech sounds, speech signals are sampled over time and stored in frames as a discrete waveform to be digitally processed. However, in order to increase the efficient use of the communication bandwidth for speech, speech is coded before being transmitted especially when speech is intended to be transmitted under limited bandwidth constraints. Numerous algorithms have been proposed for the various aspects of speech coding. For example, an analysis-by-synthesis coding approach may be performed on a speech signal. In coding speech, the speech coding algorithm tries to represent characteristics of the speech signal in a manner which requires less bandwidth. For example, the speech coding algorithm seeks to remove redundancies ""sin the speech signal. A first step is to remove short-term correlations. One type of signal coding technique is linear predictive coding (LPC). In using a LPC approach, the speech signal value at any particular time is modeled as a linear function of previous values. By using a LPC approach, short-term correlations can be reduced and efficient speech signal representations can be determined by estimating and applying certain prediction parameters to represent the signal. The LPC spectrum, which is an envelope of short term correlations in the speech signal, may be represented, for example, by LSF""s (line spectral frequencies). After the removal of short-term correlations in a speech signal, a LPC residual signal remains. This residual signal contains periodicity information that needs to be modeled. The second step in removing redundancies in speech is to model the periodicity information. Periodicity information may be modeled by using pitch prediction. Certain portions of speech have periodicity while other portions do not. For example, the sound xe2x80x9caahxe2x80x9d has periodicity information while the sound xe2x80x9cshhhxe2x80x9d has no periodicity information.
In applying the LPC technique, a conventional source encoder operates on speech signals to extract modeling and parameter information to be coded for communication to a conventional source decoder via a communication channel. One way to code modeling and parameter information into a smaller amount of information is to use quantization. Quantization of a parameter involves selecting the closest entry in a table or codebook to represent the parameter. Thus, for example, a parameter of 0.125 may be represented by 0.1 if the codebook contains 0, 0.1, 0.2, 0.3, etc. Quantization includes scalar quantization and vector quantization. In scalar quantization, one selects the entry in the table or codebook that is the closest approximation to the parameter, as described above. By contrast, vector quantization combines two or more parameters and selects the entry in the table or codebook which is closest to the combined parameters. For example, vector quantization may select the entry in the codebook that is the closest to the difference between the parameters. A codebook used to vector quantize two parameters at once is often referred to as a two-dimensional codebook. A n-dimensional codebook quantizes n parameters at once.
Quantized parameters may be packaged into packets of data which are transmitted from the encoder to the decoder. In other words, once coded, the parameters representing the input speech signal are transmitted to a transceiver. Thus, for example, the LSF""s may be quantized and the index into a codebook may be converted into bits and transmitted from the encoder to the decoder. Depending on the embodiment, each packet may represent a portion of a frame of the speech signal, a frame of speech, or more than a frame of speech. At the transceiver, a decoder receives the coded information. Because the decoder is configured to know the manner in which speech signals are encoded, the decoder decodes the coded information to reconstruct a signal for playback that sounds to the human ear like the original speech. However, it may be inevitable that at least one packet of data is lost during transmission and the decoder does not receive all of the information sent by the encoder. For instance, when speech is being transmitted from a cell phone to another cell phone, data may be lost when reception is poor or noisy. Therefore, transmitting the coded modeling and parameter information to the decoder requires a way for the decoder to correct or adjust for lost packets of data. While the prior art describes certain ways of adjusting for lost packets of data such as by extrapolation to try to guess what the information was in the lost packet, these methods are limited such that improved methods are needed.
Besides LSF information, other parameters transmitted to the decoder may be lost. In CELP (Code Excited Linear Prediction) speech coding, for example, there are two types of gain which are also quantized and transmitted to the decoder. The first type of gain is the pitch gain Gp, also known as the adaptive codebook gain. The adaptive codebook gain is sometimes referred to, including herein, with the subscript xe2x80x9caxe2x80x9d instead of the subscript xe2x80x9cpxe2x80x9d. The second type of gain is the fixed codebook gain Gc. Speech coding algorithms have quantized parameters including the adaptive codebook gain and the fixed codebook gain. Other parameters may, for example, include pitch lags which represent the periodicity of voiced speech. If the speech encoder classifies speech signals, the classification information about the speech signal may also be transmitted to the decoder. For an improved speech encoder/decoder that classifies speech and operates in different modes, see U.S. patent application Ser. No. 09/574,396 titled xe2x80x9cA New Speech Gain Quantization Strategy,xe2x80x9d filed May 19, 2000, which was previously incorporated herein by reference.
Because these and other parameter information are sent over imperfect transmission means to the decoder, some of these parameters are lost or never received by the decoder. For speech communication systems that transmit a packet of information per frame of speech, a lost packet results in a lost frame of information. In order to reconstruct or estimate the lost information, prior art systems have tried different approaches, depending on the parameter lost. Some approaches simply use the parameter from the previous frame that actually was received by the decoder. These prior art approaches have their disadvantages, inaccuracies and problems. Thus, there is a need for an improved way to correct or adjust for lost information so as to recreate a speech signal as close as possible to the original speech signal.
Certain prior art speech communication systems do not transmit a fixed codebook excitation from the encoder to the decoder in order to save bandwidth. Instead, these systems have a local Gaussian time series generator that uses an initial fixed seed to generate a random excitation value and then updates that seed every time the system encounters a frame containing silence or background noise. Thus, the seed changes for every noise frame. Because the encoder and decoder have the same Gaussian time series generator that uses the same seeds in the same sequence, they generate the same random excitation value for noise frames. However, if a noise frame is lost and not received by the decoder, the encoder and decoder use different seeds for the same noise frame, thereby losing their synchronicity. Thus, there is a need for a speech communication system that does not transmit fixed codebook excitation values to the decoder, but which maintains synchronicity between the encoder and decoder when a frame is lost during transmission.
Various separate aspects of the present invention can be found in a speech communication system and method that has an improved way of handling information lost during transmission from the encoder to the decoder. In particular, the improved speech communication system is able to generate more accurate estimates for the information lost in a lost packet of data. For example, the improved speech communication system is able to handle more accurately lost information such as LSF, pitch lag (or adaptive codebook excitation), fixed codebook excitation and/or gain information. In an embodiment of a speech communication system that does not transmit fixed codebook excitation values to the decoder, the improved encoder/decoder are able to generate the same random excitation values for a given noise frame even if a previous noise frame was lost during transmission.
A first, separate aspect of the present invention is a speech communication system that handles lost LSF information by setting the minimum spacing between LSF""s to an increased value and then decreasing the value for subsequent frames in a controlled adaptive manner.
A second, separate aspect of the present invention is a speech communication system that estimates a lost pitch lag by extrapolating from the pitch lags of a plurality of the preceding received frames.
A third, separate aspect of the present invention is a speech communication system that receives the pitch lag of the succeeding received frame and uses curve fitting between the pitch lag of the preceding received frame and the pitch lag of the succeeding received frame to fine tune its estimation of the pitch lag for the lost frame so as to adjust or correct the adaptive codebook buffer prior to its use by subsequent frames.
A fourth, separate aspect of the present invention is a speech communication system that estimates a lost gain parameter for periodic-like speech differently than it estimates a lost gain parameter for non-periodic like speech.
A fifth, separate aspect of the present invention is a speech communication system that estimates a lost adaptive codebook gain parameter differently than it estimates a lost fixed codebook gain parameter.
A sixth, separate aspect of the present invention is a speech communication system that determines a lost adaptive codebook gain parameter for a lost frame of non-periodic like speech based on the average adaptive codebook gain parameter of the subframes of an adaptive number of previously received frames.
A seventh, separate aspect of the present invention is a speech communication system that determines a lost adaptive codebook gain parameter for a lost frame of non-periodic like speech based on the average adaptive codebook gain parameter of the subframes of an adaptive number of previously received frames and the ratio of the adaptive codebook excitation energy to the total excitation energy.
An eighth, separate aspect of the present invention is a speech communication system that determines a lost adaptive codebook gain parameter for a lost frame of non-periodic like speech based on the average adaptive codebook gain parameter of the subframes of an adaptive number of previously received frames, the ratio of the adaptive codebook excitation energy to the total excitation energy, the spectral tilt of the previously received frame and/or energy of the previously received frame.
A ninth, separate aspect of the present invention is a speech communication system that sets a lost adaptive codebook gain parameter for a lost frame of non-periodic like speech to an arbitrarily high number.
A tenth, separate aspect of the present invention is a speech communication system that sets a lost fixed codebook gain parameter to zero for all subframes of a lost frame of non-periodic like speech.
An eleventh, separate aspect of the present invention is a speech communication system that determines a lost fixed codebook gain parameter for the current subframe of the lost frame of non-periodic like speech based on the ratio of the energy of the previously received frame to the energy of the lost frame.
A twelfth, separate aspect of the present invention is a speech communication system that determines a lost fixed codebook gain parameter for the current subframe of the lost frame based on the ratio of the energy of the previously received frame to the energy of the lost frame and then attenuates that parameter to set the lost fixed codebook gain parameters for the remaining subframes of the lost frame.
A thirteenth, separate aspect of the present invention is a speech communication system that sets a lost adaptive codebook gain parameter for the first frame of periodic like speech to be lost after a received frame to an arbitrarily high number.
A fourteenth, separate aspect of the present invention is a speech communication system that sets a lost adaptive codebook gain parameter for the first frame of periodic like speech to be lost after a received frame to an arbitrarily high number and then attenuates that parameter to set the lost adaptive codebook gain parameters for the remaining subframes of the lost frame.
A fifteenth, separate aspect of the present invention is a speech communication system that sets a lost fixed codebook gain parameter for a lost frame of periodic like speech to zero if the average adaptive codebook gain parameter of a plurality of the previously received frames exceeds a threshold.
A sixteenth, separate aspect of the present invention is a speech communication system that determines a lost fixed codebook gain parameter for the current subframe of a lost frame of periodic like speech based on the ratio of the energy of the previously received frame to the energy of the lost frame if the average adaptive codebook gain parameter of a plurality of the previously received frames does not exceed a threshold.
A seventeenth, separate aspect of the present invention is a speech communication system that determines a lost fixed codebook gain parameter for the current subframe of a lost frame based on the ratio of the energy of the previously received frame to the energy of the lost frame and then attenuates that parameter to set the lost fixed codebook gain parameters for the remaining subframes of the lost frame if the average adaptive codebook gain parameter of a plurality-of the previously received frames exceeds a threshold.
An eighteenth, separate aspect of the present invention is a speech communication system that randomly generates a fixed codebook excitation for a given frame by using a seed whose value is determined by information in that frame.
A nineteenth, separate aspect of the present invention is a speech communication decoder that after estimating lost parameters in a lost frame and synthesizing the speech, matches the energy of the synthesized speech to the energy of the previously received frame.
A twentieth, separate aspect of the present invention is any of the above separate aspects, either individually or in some combination.
Further separate aspects of the present invention can also be found in a method of encoding and/or decoding a speech signal that practices any of the above separate aspects, either individually or in some combination.