The present invention relates to an encoding method applied to a video sequence subdivided into groups of frames (GOFs) and based on an on-line procedure that comprises the steps of:
an initializing step, provided for defining for the encoding process its input parameters;
a motion estimation/compensation step, provided for performing a motion estimation and compensation between pairs of successive frames;
a spatio-temporal decomposition step, provided for carrying out a three-dimensional (3D) wavelet decomposition using a biorthogonal filter bank in a so-called lifting scheme itself using weighting constants defined in the initialization step;
an encoding step, provided for encoding the transform coefficients thus obtained, by means of an encoding method adapted to 3D decompositions.
In video compression schemes, the reduction of temporal redundancy is mainly achieved by two types of approaches. In the first one, the so-called xe2x80x9chybridxe2x80x9d or predictive approach, a prediction of the current frame is computed based on the previously transmitted frames, and only the prediction error is intra-coded and transmitted. In the second one, the temporal redundancy is exploited by means of a temporal transform, the 3D (or 2D+t) approach, which is similar to spatial techniques for removing redundancies. According to this last approach, the sequence of frames is processed as a 3D volume, and the classical subband decomposition often used in image coding can thus be extended to 3D spatio-temporal data by using separable transforms (for example, wavelet or wavelet packets transforms implemented by means of filter banks). Obviously, there is, in the 3D structure, an anisotropy, which can however be taken into account by using different filter banks in the temporal and spatial directions (usually, Haar filters are used for temporal filtering since the added delay of using longer filters is undesirable; furthermore, they are two-tap filters and are the only perfect reconstruction orthogonal filters which do not present the boundaries effect).
The coding efficiency of this 3D coding scheme can be improved by performing motion estimation/compensation in the low temporal subbands, at each level of the temporal decomposition. The 3D subband decomposition is applied on the compensated group of frames (this group of frames must contain a power of two number of frames, usually 16), and, at the last temporal decomposition level, there are two frames in the lowest temporal subband. In each frame of the temporal subbands, a spatial decomposition is performed.
Subband coding the three-dimensional structure of data can be realized as an extension of the spatial subband coding techniques. One of the most effective wavelet-based scheme for image compression is based on the 2D SPIHT algorithm, described in a detailed manner in xe2x80x9cA new, fast, and efficient image codec based on set partitioning in hierarchical trees (=SPIHT)xe2x80x9d, by A. Said and W. A. Pearlman, IEEE Transactions on Circuits and Systems for Video Technology, vol.6, June 1996, pp.243-250, and recently extended to the 3D structures. The basic concepts used in the 3D coding technique are the following: spatio-temporal trees corresponding to the same location are formed in the wavelet domain, then the wavelet transform coefficients in these trees are partitioned into sets defined by the level of the highest significant bit in a bit-plane representation of their magnitudes, and, finally, the highest remaining bit planes are coded and the resulting bits are transmitted.
The original SPIHT algorithm is based on the hypothesis of an orthogonal decomposition, according to which the reconstruction error is equal to the quantization error measured as the sum of subband distorsions (that is why it is possible to distribute the bit budget based on the energy of each subband). It has also been shown that the best results in image and video coding are not achieved using orthogonal filters, but biorthogonal ones (this is due to the fact that the symmetry of biorthogonal filters yields wavelet coefficients at the same spatial location as their parents). However, as biorthogonal filters do not preserve the L2 norm of the quantization error, the bit repartition deduced for an orthogonal transform would not lead to the minimum reconstruction error.
Moreover, biorthogonal filters are determined up to a multiplicative constant and therefore an infinity of filter banks may be designed when only the perfect reconstruction condition is imposed. If a lifting implementation of the filter bank is used, it is known that the polyphase matrix P(z) can always be decomposed into factors             P      ⁡              (        z        )              =                  ∏                  i          =          1                m            ⁢                                    [                                                            1                                                                                            s                      i                                        ⁢                                          xe2x80x83                                        ⁢                                          (                      z                      )                                                                                                                    0                                                  1                                                      ]                    ⁡                      [                                                            1                                                  0                                                                                                                        t                      i                                        ⁢                                          xe2x80x83                                        ⁢                                          (                      z                      )                                                                                        1                                                      ]                          ⁡                  [                                                    K                                            0                                                                    0                                                              1                  /                  K                                                              ]                      ,
where ti and si are Laurent polynomials and K is a real constant. The dual polyphase matrix, corresponding to the synthesis part, is given in this case by:             P      ~        ⁡          (      z      )        =            ∏              i        =        1            m        ⁢                                        [                                                            1                                                  0                                                                                                  -                                                                  s                        i                                            ⁡                                              (                                                  z                                                      -                            1                                                                          )                                                                                                              1                                                      ]                    ⁡                      [                                                            1                                                                      -                                                                  t                        i                                            ⁡                                              (                                                  z                                                      -                            1                                                                          )                                                                                                                                          0                                                  1                                                      ]                          ⁡                  [                                                                      1                  /                  K                                                            0                                                                    0                                            K                                              ]                    .      
The perfect reconstruction is insured for all K, since:
P(z){tilde over (P)}(zxe2x88x921)t=I 
This decomposition is implemented through the dual schemes illustrated in FIGS. 1 and 2. There exists therefore an infinity of implementations, corresponding to different factorizations and different values of the constant K. From the coding point of view, these filter banks are not all equivalent. A usual criterion for choosing this multiplicative constant is to impose the determinant of the polyphase matrix to be equal to 1, but this choice may not be the best for coding performances (that is why most of the usual algorithms are based on biorthogonal transforms that are nearly orthogonal).
It is therefore an object of the invention to propose a technique for improving the applicability of highly non-orthogonal wavelet transforms for video coding schemes, based on a 3D wavelet decomposition and an automatic bit allocation mechanism.
To this end, the invention relates to an encoding method such as defined in the introductory part of the description and which is moreover characterized in that it also comprises an off-line procedure that consists of off-line computations including the sub-steps of:
defining for these off-line computations the same type of input parameters, with the exception of the weighting constants;
carrying out a random GOF generation sub-step, the generated GOF containing a white, Gaussian noise of mean and standard deviation adapted to the representation of the original sequence;
implementing a 3D wavelet decomposition based on said lifting scheme and using the same filter bank without any weighting constants;
computing the standard deviations of the spatio-temporal sub-bands resulting from said 3D decomposition;
dividing said standard deviations by the standard deviation of the noise, optimal weighting constants being available at the output of said division sub-step and sent towards said filter bank in order to allow to weight the output of the spatio-temporal sub-bands.
Based on the energy repartition in subbands, the optimal weights for the biorthogonal filters in a 3D decomposition are thus determined. Some complicated methods based on relaxation algorithms are already used in the literature for choosing these constants in the framework of image coding, for instance the method described in xe2x80x9cA multiscale relaxation algorithm for SNR maximization in non-orthogonal subband codingxe2x80x9d, by P. Moulin, IEEE Transactions on Image Processing, vol.4, nxc2x09, September 1995, pp.1269-1281. Techniques exploiting the human visual system in order to make the reconstruction error subjectively more acceptable also exist, such as the one described in xe2x80x9cSignal-adapted multiresolution transform for image codingxe2x80x9d, by P. Desarte and al, IEEE Transactions on Information Theory, vol.38, nxc2x02, March 1992, pp.897-904. However, all these existing algorithms refer to image coding, while the method here proposedxe2x80x94an extension of previous works to the case of a 3D wavelet decompositionxe2x80x94is dedicated to video coding. This method may be applied to any type of linear filter, in particular in the framework of a motion compensated temporal filtering.