1. Field of the Invention
The present invention relates to a process for the vector quantization of low bit rate vocoders.
It applies in particular to linear-prediction vocoders similar to those described for example in the THOMSON-CSF Technical Journal, Volume 14. No. 3, Sep. 1982, pages 715 to 731, and according to which the speech signal is identified at the output of a digital filter whose input receives, either a periodic waveform corresponding to those of the voiced sounds, viz. vowels, or to a random waveform corresponding to those of the unvoiced sounds, viz. the majority of consonants.
2. Discussion of the Background
It is known that the auditory quality of linear-prediction vocoders depends in large part on the accuracy with which their predictive filter is quantized and that this quality generally decreases as the digital bit rate between vocoders decreases since the accuracy of quantization of the filter then becomes insufficient. Numerous quantization processes of the type of those described for example in Patent Application EP 0504485 A2 or in U.S. Pat. No. 4,907,276 have been developed in order to solve this problem. In general, the speech signal is segmented into independent frames of constant duration and the filter is renewed at each frame. Thus, to arrive at a bit rate of around 1820 bits per second, the filter must, according to a standard implementation, be represented by a packet of 41 bits transmitted every 22.5 milliseconds. For non-standard links with a lower bit rate, of the order of 800 bits per second, fewer than 800 bits per second have to be transmitted in order to represent the filter, this constituting a bit rate ratio of approximately 3 as compared with standard implementations. 30 bits on average are used to quantize one filter out of two, and these 30 bits are composed of 3 bits defining a quantization scheme and 27 bits for quantizing 10 quantities obtained from LAR (Log Area Ratio) coefficients by displacement and rotation in the 10-dimensional space thus defined. As a result the quantization now begins to be only approximately transparent, and auditory compensation of this artefact is necessary, by coarse quantization of the filters located in the transitions of the speech signal and fine quantization of those corresponding to stable zones. To obtain sufficient accuracy of quantization of the predictive filter despite everything, the conventional approach consists in employing a vector quantization scheme which is intrinsically more efficient than that used in standard systems where the 41 bits employed serve for the scalar quantization of the P=10 coefficients of their prediction filter. The method relies on using a dictionary containing a specified number of standard filters obtained by learning. It consists in transmitting only the page or the index at which the standard filter rate which is obtained, only 10 to 15 bits per filter being transmitted instead of the 41 bits required in scalar quantization mode, however this bit rate reduction is obtained at the cost of a very large increase in the memory size required to store the elements of the dictionary and of a considerable computational burden attributable to the complexity of the filter search algorithm.
By applying this approach also to low bit rate vocoders of 800 bits/s and less, it is commonly supposed that 24 bits are sufficient for a composite dictionary produced from two dictionaries with 4,096 elements accounting for the first four and last six LSPs respectively. The major drawback of this type of quantization again resides in the need to compile this dictionary, to store it and to perform the quantization proper.
Alternatives to the vector quantization scheme have also been proposed in order to reduce the number of elements stored in the dictionary. Thus, a technique of pyramidal vector quantization is in particular known, a description of which may be found in the journal IEEE trans. on INFTH Vol. IT 32 No. 4, July 1986, pages 568 to 582 by Thomas R. Fischer entitled "A pyramid vector quantizer". According to this technique the multidimensional input data are distributed over the vertices of a regular grid included within a pyramid of multiple dimension. This quantization technique is applied mainly in respect of data with a Laplacian distribution characteristic. However, the reduction in bit rate which results from this is not always sufficiently appreciable. This is due in particular to the fact that in practice the overall shape of the multidimensional data to be processed is in actual fact inscribed within an ellipsoid, especially when using a prediction/extrapolation computation system which always involves a Gaussian characteristic shape of data. Moreover, the pyramid which is inscribed on this ellipsoid leads to the coding of the points which lie outside the ellipsoid surrounding the scatter of points to be coded, thereby making it necessary to dimension code words with a number of bits which exceeds what is strictly necessary.