1. Field of the Invention
The present invention relates to the field of speech coding, and more particularly, to a robust, fast search scheme for a two-dimensional gain vector quantizer table.
2. Description of Related Art
A prior art speech coding system 200 is illustrated in FIG. 1. One of the techniques for coding and decoding a signal 100 is to use an analysis-by-synthesis coding system, which is well known to those skilled in the art. An analysis-by-synthesis system 200 for coding and decoding signal 100 utilizes an analysis unit 204 along with a corresponding synthesis unit 222. The analysis unit 204 represents an analysis-by-synthesis type of speech coder, such as a code excited linear prediction (CELP) coder. A code excited linear prediction coder is one way of coding signal 100 at a medium or low bit rate in order to meet the constraints of communication networks and storage capacities. An example of a CELP based speech coder is the recently adopted International Telecommunication Union (ITU) G.729 standard, herein incorporated by reference.
In order to code speech, the microphone 206 of the analysis unit 204 receives the analog sound waves 100 as an input signal. The microphone 206 outputs the received analog sound waves 100 to the analog to digital (A/D) sampler circuit 208. The analog to digital sampler 208 converts the analog sound waves 100 into a sampled digital speech signal (sampled over discrete time periods) which is output to the linear prediction coefficients (LPC) extractor 210 and the pitch extractor 212 in order to retrieve the format structure (or the spectral envelope) and the harmonic structure of the speech signal, respectively.
The format structure corresponds to short-term correlation and the harmonic structure corresponds to long-term correlation. The short-term correlation can be described by time varying filters whose coefficients are the obtained linear prediction coefficients (LPC). The long-term correlation can also be described by time varying filters whose coefficients are obtained from the pitch extractor. Filtering the incoming speech signal with the LPC filter removes the short-term correlation and generates an LPC residual signal. This LPC residual signal is further processed by the pitch filter in order to remove the remaining long-term correlation. The obtained signal is the total residual signal. If this residual signal is passed through the inverse pitch and LPC filters (also called synthesis filters), the original speech signal is retrieved or synthesized. In the context of speech coding, this residual signal has to be quantized (coded) in order to reduce the bit rate. The quantized residual signal is called the excitation signal, which is passed through both the quantized pitch and LPC synthesis filters in order to produce a close replica of the original speech signal. In the context of analysis-by-synthesis CELP coding of speech, the quantized residual signal is obtained from a code book 214 normally called the fixed code book. This method is described in detail in the ITU G.729 document.
The fixed code book 214 of FIG. 1 contains a specific number of stored digital patterns, which are referred to as code vectors. The fixed codebook 214 is normally searched in order to provide the best representative code vector to the residual signal in some perceptual fashion as known to those skilled in the art. The selected code vector is typically called the fixed excitation signal. After determining the best code vector that represents the residual signal, the fixed codebook unit 214 also computes the gain factor of the fixed excitation signal. The next step is to pass the fixed excitation signal through the pitch synthesis filter. This is normally implemented using the adaptive code book search approach in order to determine the optimum pitch gain and pitch lag in a xe2x80x9cclosed-loopxe2x80x9d fashion as known to those skilled in the art. The xe2x80x9cclosed-loopxe2x80x9d method, or analysis-by-synthesis, means that the signals to be matched are filtered.
The optimum pitch gain and lag enable the generation of a so-called adaptive excitation signal. The determined gain factors for both the adaptive and fixed code book excitations are then quantized in a xe2x80x9cclosed-loopxe2x80x9d fashion by the gain quantizer 216 using a look-up table with an index, which is a well known quantization scheme to those of ordinary skill in the art. The index of the best fixed excitation from the fixed code book 214 along with the indices of the quantized gains, pitch lag and LPC coefficients are then passed to the storage/transmitter unit 218.
The storage/transmitter 218 of the analysis unit 204 then transmits to the synthesis unit 222, via the communication network 220, the index values of the pitch lag, pitch gain, linear prediction coefficients, the fixed excitation code vector, and the fixed excitation code vector gain which all represent the received analog sound waves signal 100. The synthesis unit 222 decodes the different parameters that it receives from the storage/transmitter 218 to obtain a synthesized speech signal. To enable people to hear the synthesized speech signal, the synthesis unit 222 outputs the synthesized speech signal to a speaker 224.
The analysis-by-synthesis system 200 described above with reference to FIG. 1 has been successfully employed to realize high-quality speech coders. As can be appreciated by those skilled in the art, natural speech can be coded at very low bit rates with high quality.
FIG. 2 is a block diagram illustrating more generally how a speech signal is coded. A digitized input speech signal is input to an LP analysis block 300. The LP analysis block 300 removes the short-term correlation (i.e. extracts the form and structure of the speech signal). As a result of the LP analysis, LPC coefficients are generated and quantized (not shown). The signal output by the LP analysis block 300 is known as a residual signal. This residual signal is quantized by the quantizer 302 using a fixed excitation codebook and an adaptive excitation codebook. At block 304 a fixed excitation gain gc and an adaptive excitation gain gp are determined. Gains gc and gp are then quantized at block 306. The indices for the quantized LPC coefficients, the optimal fixed and adaptive excitation vectors, and the quantized gains are then transmitted over the communications channel.
In CELP based speech coders, the adaptive excitation gain and the fixed excitation gain are often jointly quantized using a two-dimensional vector quantizer for efficient coding. This quantization process requires a search of a codebook whose size may range from 64 (6 bits) to 512 (9 bits) entries in order to find the best possible match for the input gain vector The search algorithm required to perform this search, however, is too complex for many applications. Thus, there is a need for a fast search algorithm to search a gain quantizer table. Moreover, it is desirable to have a robust quantizer table, that is, a quantizer table designed to minimize bit errors due to poor quality transmission channels.
A vector quantizer (VQ) table is arranged in increasing order with regard to a gc gain value (as may be represented by a prediction error energy En). The single stage VQ table is then organized into two-dimensional bins, with each bin arranged in increasing order of a gp gain value. A one-dimensional auxiliary scalar quantizer is constructed from the largest prediction error energy values from each bin. The prediction error energy values in the auxiliary scalar quantizer are arranged in increasing order of magnitude. In order to quantize input gain values, the auxiliary scalar table is searched for the best prediction error energy match. The VQ table bin corresponding to the best match in the auxiliary table is then searched for the best En and gp match. Nearby bins may also be searched for a more optimal combination. The selected best match is used to quantize the input gain values. A VQ constructed accordingly, results in a robust and fast search scheme.