A very widespread solution in the compression of digital signals is vector quantization. A first incentive to use vector quantization may be found in block coding theory developed by Shannon according to which better performance may be achieved by boosting the dimension of the vectors to be coded. Vector quantization consists in representing an input vector by a vector of like dimension chosen from a finite set. Thus, providing a quantizer with M levels (or codevectors) amounts to creating a non-bijective mapping from the set of input vectors (generally the Euclidian real space with n dimensions Rn, or else a subset of Rn) into a finite subset Y of Rn. The subset Y then comprises M distinct elements:Y={y1, y2, . . . yM}.
Y is called the reproduction alphabet, or else dictionary, or else directory. The elements of Y are called “codevectors”, “code words”, “exit points”, or else “representatives”.
The rate per dimension (r) of the quantizer (or else its “resolution”) is defined by:
  r  =            1      n        ⁢          log      2        ⁢    M  
In vector quantization, a block of n samples is processed as a vector of dimension n. The vector is coded by choosing a codevector, from a dictionary of M codevectors, the one which most “resembles” it. In general, an exhaustive search is made among all the elements of the dictionary to select the element of the dictionary which minimizes a measure of distance between it and the input vector.
According to the theory of source coding, when the dimension becomes too large, the performance of the vector quantization approaches a limit termed the “bound of rate-distortion of the source”. Apart from the dimensionality of the space, vector quantization may also utilize the properties of the source to be coded, for example nonlinear and/or linear dependencies, or else the shape of the probability distribution. In general, the dictionaries of vector quantizers are designed on the basis of statistical procedures such as the generalized Lloyd algorithm (denoted GLA). This algorithm, well known, is based on the necessary conditions of optimality of a vector quantization. On the basis of a training sequence representative of the source to be coded and of an initial dictionary, the dictionary is constructed iteratively. Each iteration comprises two steps:                the construction of the regions of quantization by quantization of the training sequence according to the rule of the nearest neighbour, and        the improving of the dictionary by replacing the old codevectors by the centroids of the regions (according to the rule of centroids).        
To avoid the convergence to a local minimum of this deterministic iterative algorithm, variants termed “stochastic relaxation” (denoted SKA standing for “Stochastic K-means algorithm”) inspired by the technique of simulated annealing have been proposed by introducing a randomness into the step of constructing the centroids and/or into that of constructing the classes. The statistical vector quantizers thus obtained do not possess any structure, thereby rendering their exploration expensive in terms of calculations and memory greedy. Specifically, the complexity both of the coding and of the storage, is proportional to n.2nr. This exponential increase as a function of the dimension of the vectors and of the rate limits the use of unstructured vector quantizers to small dimensions and/or low rates so as to be able to implant them in real time.
Scalar quantization, which quantizes the samples individually, is not as effective as vector quantization since it can utilize only the shape of the probability distribution of the source and the linear dependency. However, scalar quantization is less expensive in terms of calculations and memory than vector quantization. Moreover, scalar quantization associated with entropy coding can achieve good performance even at moderate resolutions.
To circumvent the constraints of size and of dimension, several variants of the basic vector quantization were studied, they attempt to remedy the absence of structure of the dictionary and thus succeed in reducing the complexity to the detriment of quality. However, the performance/complexity compromise is improved, thereby making it possible to increase the span of resolutions and/or of dimensions to which the vector quantization may be applied effectively in terms of cost of calculations or of memory.
Numerous schemes of structured vector quantizers have been proposed in the literature. The main ones are the following:                the tree vector quantizer which imposes a hierarchical tree structure on the dictionary: the search procedure is simplified but the quantizer requires more storage memory,        the multi stage vector quantizer which cascades vector quantizers of lesser levels: the dictionaries are of reduced sizes and the same goes as regards the calculation time and the memory cost,        the vector quantizer termed the “Cartesian product” of N classical vector quantizers of smaller dimensions and sizes: the input vector is decomposed into N subvectors, each subvector being quantized independently of the others,        the “gain/orientation” vector quantizer constitutes a particular case of the “Cartesian product” vector quantizer: two quantizers are provided, one a scalar quantizer and the other a vector quantizer, which code separately, independently or otherwise, the gain (or the norm) of the vector and its orientation (by considering the normalized input vector). This type of vector quantization is also called “spherical” vector quantization or “polar” vector quantization,        the “permutation code” vector quantizer, whose codevectors are obtained by permutations of the components of a leader vector and its generalization to the composite (or the union) of permutation codes.        
The techniques described above all come within a statistical approach.
Another radically different approach has also been proposed. This is algebraic vector quantization, which uses highly structured dictionaries, arising from regular lattices of points or error corrector codes. By virtue of the algebraic properties of their dictionaries, algebraic vector quantizers are simple to implement and do not have to be stored in memory. The utilization of the regular structure of these dictionaries actually allows the development of optimal and fast search algorithms and of mechanisms for associating in particular an index with a corresponding codevector (for example through a formula). Algebraic vector quantizers are less complex to implement and require less memory. However, they are optimal only for a uniform distribution of the source (either in space, or on the surface of a hypersphere). Being a generalization of the uniform scalar quantizer, the algebraic vector quantizer is more difficult to tailor to the distribution of the source through the so-called “companding” technique. It is also recalled that the indexation (or numbering) of the codevectors and the inverse operation (decoding) require more calculations than in the case of statistical vector quantizers, for which these operations are performed by simple readings from a table.
Certain aspects of a variable-dimension quantization and the problems encountered are presented hereinbelow.
It is firstly indicated that vector quantization is a well known and effective technique for coding blocks of samples of fixed length. However, in numerous applications of digital signal compression, the signal to be coded is modelled by a sequence of parameters of variable length. Effective compression of these vectors of variable dimension is crucial for the design of many multimedia coders such as speech or audio coders (“MBE” coder, harmonic coder, sinusoidal coder, transform based coder, coder based on interpolation of prototype waveforms).
In sinusoidal coders, the number of sinusoids extracted depends on the number of sinusoidal spikes detected in the signal, which number varies in the course of time as a function of the nature of the audio signal.
Furthermore, numerous techniques of speech compression utilize the long-term periodicity of the signal. Such is the case for harmonic coders where the spectral components of a set of frequencies, which are the harmonics of the fundamental period of the talker, are coded. The number of spectral harmonic spikes being inversely proportional to the fundamental frequency, as this fundamental period varies according to the talker (typically, children having a higher frequency of vibration of the vocal cords than men) and over time, the number of components to be quantized also changes over time from frame to frame.
Such is also the case for PWI coders (standing for “Prototype Waveform Interpolation”) where the prototype waveforms are extracted over segments of length equal to the period of the pitch, hence also temporally variable. In PWI coders, the quantization of these waveforms of variable length is effected by separately coding the gain (or “RMS” standing for “Root-Mean-Square”) and the normalized waveform which is itself decomposed into two waveforms of the same variable length: the REW waveform (“Rapidly Evolving Waveform”) and the SEW waveform (“Slowly Evolving Waveform”). For a frame of fixed length, the number of prototypes is variable, hence the number of gains, of REW and SEW is likewise variable, as is the dimension of the REW and SEW waveforms.
In other types of coders, such as transform-based audio coders, the number of transform coefficients obtained over fixed-length frame lengths is imposed but it is usual to group these coefficients into frequency bands for their quantization. Conventionally, this splitting is performed into bands of unequal widths so as to utilize the psychoacoustic properties of human hearing by following the critical bands of the ear. The span of variation of the dimension of these vectors of transform coefficients typically varies from 3 (for the lower frequency bands) to 15 (for the high frequency bands), in a wideband coder (50 Hz-7000 Hz), and even up to 24 in an FM band coder (covering the 20 Hz-16000 Hz audible range).
Thoeretically, an optimal vector quantizer of variable dimension would utilize a set of dictionaries of fixed dimension, one for each possible dimension of the input vector. For example, in harmonic coders, for a pitch period of 60 to 450 Hz, the number of harmonic spikes in the telephone band varying from 7 for high-pitched voices (children) to 52 for low-pitched voices (men), it would be necessary to construct, place in memory and implement 46 (46=52−7) vector quantizers. The design of each dictionary requires a learning sequence long enough to correctly represent the statistics of the input vectors. Moreover, the storage of all the dictionaries turns out to be impractical or very expensive in memory. It is therefore seen that in the case of variable dimension, it is difficiult to harness the advantages of vector quantization while complying with the constraints of memory storage and also of training sequences.
Presented hereinbelow are certain aspects of a quantization with variable resolution and the problems encountered.
It is pointed out firstly that the variability of the input signal is not manifested solely through the variation in the number of parameters to be coded but also through the variation in the quantity of binary information to be transmitted for a given quality. For example in speech, onsets, voiced sounds and unvoiced sounds do not require the same rate for one and the same quality. Relatively unpredictable onsets require a higher rate than voiced sounds that are more stable and whose stationarity may be exploited by “predictors” which make it possible to reduce the rate. Finally, the unvoiced sounds do not require high coding precision and hence need little rate.
To utilize the temporal variation of the characteristics of multimedia signals such as voice or video, it is judicious to design variable rate coders. These variable rate coders are especially suited to communications over lattices, packetwise, such as the Internet, ATM, or others.
Specifically, packet switching makes it possible to handle and process the information bits in a more flexible manner and hence to increase the capacity of the channel by reducing the mean rate. The use of variable rate coders is also an effective means of combating congestion of the system and/or of accommodating the diversity of access conditions.
In multimedia communications, variable rate quantizers also make it possible to optimize the distributing of the rate between:                the source and channel codings: as in the concept of AMR (“Adaptive Multi Rate”), the rate can be switched on each 20-ms frame so as to be adapted dynamically to the traffic and channel error conditions. The overall quality of the speech is thus improved by ensuring good protection against errors, while reducing the rate for the coding of the source if the channel degrades;        the various types of media signals (such as voice and video in video conferencing applications);        the various parameters of one and the same signal: in transform based audio coders, for example, it is usual to distribute the bits dynamically between the spectral envelope and the various bands of coefficients. Often, an entropy coding of the envelope is firstly performed and its objective is to utilize the nonuniform distribution of the code words by assigning variable length codes to the code words, the most probable ones having a shorter length than the least probable ones, thereby leading to the minimization of the mean length of the code words. Moreover, to utilize the psychoacoustic properties of the human ear, the remaining (variable) rate is allotted dynamically to the frequency bands of the coefficients as a function of their perceptual significance.        
New applications of multimedia coding (such as audio and video) require highly flexible quantizations both as regards dimension and rate. The range of rates having moreover to make it possible to achieve high quality, these multidimensional and multiresolution quantizers must be aimed at high resolutions. The complexity barrier posed by these vector quantizers remains, per se, a performance to be achieved, despite the increase in processing powers and memory capacities of the new technologies.
As will be seen hereinbelow, most of the source coding techniques proposed are aimed either at solving the problems related to a variable dimension, or the problems related to a variable resolution. Few techniques proposed today make it possible to solve these two problems jointly.
As regards vector quantization with variable dimension, known, the variability of the dimension of the parameters to be coded constitutes per se an obstacle to the use of vector quantization. Thus, the first versions of the transform based coder employ Lloyd-Max scalar quantizers. A coder of this type, termed “TDAC”, developed by the Applicant, is described in particular in:                “High Quality Audio Transform Coding at 64 kbit/s”, by Y. Mahieux, J. P. Petit, in IEEE Trans. Commun, Vol. 42, No 11, pp. 3010-3019, November 1994.        
Other solutions have been proposed to solve this problem of variable dimension vector quantization. The “IMBE” coder uses a complicated coding scheme with variable binary allocations and a scalar/vector hybrid quantization.
An approach very commonly used to quantize vectors of variable dimension consists in pre-processing the vector of variable dimension so as to convert it into another vector of fixed dimension before quantization. There are several variants of this vector quantization technique associated with dimension conversion (this type of vector quantization being denoted DCVQ standing for “Dimension Conversion Vector Quantization”).
Among the various dimension conversion procedures proposed, mention may in particular be made of: truncation, subsampling, interpolation, “length warping”.
For sinusoidal speech coders or MBEs, it has been proposed that the spectral coefficients be approximated by an all-pole model of fixed order and then a vector quantization of fixed dimension of the parameters of the model be performed. Another technique of vector quantization by nonsquare matrix transform solves the problem of vector quantization of variable dimension L by combining a vector quantization of fixed dimension K (K<L) with a nonsquare matrix linear transform (L×K).
There is also another type of vector quantization associated with a dimension conversion which still uses a vector quantizer of fixed dimension K but the dimension conversion is applied to the codevectors to obtain codevectors having the same dimension as the input vector.
The drawback of vector quantization associated with a dimension conversion is that the total distortion has two components: one due to the quantization, the other to the dimension conversion. To avoid this distortion due to dimension conversion, another approach of vector quantization of variable dimension consists in considering each input vector of variable dimension L as formed of a subset of components of an “underlying” vector of dimension K (L<K) and in designing and using just a single “universal” dictionary of fixed dimension K which nevertheless covers the entire span of dimensions of the input vectors, the correspondence between the input vector being effected by a selector. However, this “universal” dictionary encompassing all the other dictionaries of lower dimensions does not appear to be optimal for the lowest dimensions. In particular, the maximum resolution rmax per dimension is limited by the storage constraint and by the rate per vector of parameters. For a dictionary of size 2Krmax, the quantity of memory required to store this dictionary is K2Krmax values and its rate per vector of parameters is Krmax. Thus, for one and the same size of dictionary (and hence one and the same rate per vector of parameters and per frame), a vector of dimension L (L<K) could have a resolution (or a rate per dimension) K/L times larger, and this for K/L times smaller a volume of information to be stored.
As regards vector quantization with variable resolution, known, a simple solution consists in, as for the case of vector quantization with variable dimension, using a scalar quantization, as for example in the first versions of the TDAC transform based coder.
However, the use of an integer resolution per sample entails a coarse granularity of resolution per band of coefficients which hinders the effectiveness of the dynamic binary allocation procedure. Thus the use has been proposed of scalar quantizers with an odd integer number of reconstruction levels, in combination with a procedure for arranging the coded indices as a joint binary train. The finer granularity of the resolution afforded, more propitious for the binary allocation procedure, has made it possible to improve the quality, at the price of the complexity of the algorithm for combining the indices, this algorithm being necessary for the arrangement into a binary train to be effective in terms of rate. Nevertheless, for elevated frequency bands having a larger number of coefficients, the constraint of an integer number of levels per sample, due to the scalar quantization, is still manifested through too coarse a granularity of the resolutions per band.
Vector quantization make it possible to circumvent this constraint of a number of integer levels per sample and permits fine granularity of the resolutions available. On the other hand, the complexity of the vector quantization often limits the number of available rates. For example, the AMR-NB multirate speech coder, based on the well known ACELP technique, comprises eight fixed rates ranging from 12.2 kbit/s to 4.75 kbit/s, each having a different level of protection against errors by virtue of a different distribution of the rate between the source coding and the channel coding. For each of the parameters of the ACELP coder (LSP, LTP delayers, excitation gains, fixed excitation), dictionaries of different resolution have been constructed. However, the number of available rates for each of these parameters is limited by the complexity of storage of the nonalgebraic vector quantizers. Moreover, in the AMR-WB multirate coder comprising nine rates ranging from 6.60 to 23.85 kbit/s, the variation in the rates is essentially ensured by the algebraic excitation dictionaries which require no storage. There are eight dictionaries and therefore eight rates for the fixed excitation while the other parameters which use stochastic dictionaries (LSP, gains, absolute and differential delays) have only two possible rates.
It is indicated that the stochastic vector quantizers used in AMR multirate coders are vector quantizers with constrained structure (Cartesian product and multiple stages). A large family of variable rate quantizers can in fact be based on constrained structure vector quantizers such as the quantizers already mentioned having multiple stages, Cartesian products, but also tree-based vector quantizers. The use of these tree-based vector quantizers for variable rate coding has formed the subject of numerous studies. The binary tree-based vector quantizer was the first to be introduced. It derives naturally from the LBG algorithm for designing a vector quantizer by successive splittings of the centroids on the basis of the “root” node, the barycentre of the training sequence. Variant tree-type vector quantizers have been proposed based on pruning or on the contrary based on ramifying certain nodes of the tree according to their attributes such as their distortion, their population leading to nonbinary and/or nonequilibrated tree-based vector quantizers.
FIGS. 1a and 1b represent tree-structured vector quantizers. More particularly, FIG. 1a represents an equilibrated binary tree, whereas FIG. 1b represents a nonbinary and nonequilibrated tree.
Multi-resolution vector quantizers are easily constructed on the basis of a tree-type vector quantizer, by selecting the number of nodes corresponding to the various resolutions desired. The tree-type hierarchical structure is appealing and simplifies the search procedure. On the other hand, it involves a suboptimal search and a significant increase in the necessary memories since all the nodes of the tree from the root-node to the terminal nodes via all the nodes of the intermediate levels must be stored. Moreover, as the set of nodes of a dictionary of lower resolution is not included in the dictionaries of higher resolution, the decrease in the quantization error as a function of the increase in the rate of the vector quantizer is not guaranteed locally.
Moreover it is known how to construct variable resolution quantizers on the basis of algebraic codes, in particular EAVQ embedded algebraic vector quantizers which use subsets of spherical codes of the regular Gosset lattice in dimension 8.
In the document:                “A 16, 24, 32 kbit/s wideband speech codec based on ACELP” by P. Combescure, J. Schnitzler, K. Fischer, R. Kircherr, C. Lamblin, A. Le Guyader, D. Massaloux, C. Quinquis, J. Stegmann, P. Vary, in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp 5-8, 1999, this embedded algebraic vector quantization approach has been extended to variable dimension quantization using algebraic codes of various dimensions. Even though this generalization of EAVQ quantization makes it possible to quantize vectors of variable dimension at variable resolutions, it has drawbacks.        
The distribution of the input vectors must be uniform. However, to adapt the distribution of the source to this constraint is a very difficult task. The design of algebraic quantizers based on regular lattices also poses the problem of truncating and tailoring the regions of the various regular lattices to obtain the various resolutions desired, doing so for the various dimensions.
The present invention aims to improve the situation.