The invention relates to a video coding method applied to a sequence of frames and based on a three-dimensional (3D) wavelet decomposition with motion estimation and compensation on couples of frames, said decomposition being a wavelet transform that leads from the original set of picture elements (pixels) of the frames to transform coefficients constituting a hierarchical pyramid, and a spatio-temporal orientation treexe2x80x94in which the roots are formed with the pixels of the approximation subband resulting from the 3D wavelet transform and the offspring of each of these pixels is formed with the pixels of the higher subbands corresponding to the image volume defined by these root pixelsxe2x80x94defining the spatio-temporal relationship inside said hierarchical pyramid.
The main present research directions in video compression, especially in the field of multimedia, are related to the scalability and the progressive transmission. With such functionalities, a transmission process can send only a subset of an original signal in order to achieve a desired level of resolution and/or fidelity. The most important information is sent first, then it is refined as much as the bandwidth of the receiver allows. Embedding the bitstream is the other important feature to achieve: the coding and decoding process can then be used on some networks on which interruptions during the transmission or loss of information may occur, since the data effectively transmitted are used efficiently to reconstruct as much as information as possible. Moreover, all the necessary information for decoding a shorter part of the bitstream has to be self-sufficient.
Some of the above mentioned points may be achieved by applying well-known techniques like the bit-plane encoding: the most significant bit-plane is encoded first, and, at each pass, the following bit-plane is transmitted. In such a progressive transmission scheme, the highest bit-planes may contain many zeros and would be very well compressed via an entropy encoder. If one further develops this analysis in the case of still pictures, using a wavelet decomposition leads to a good correlation of the coefficients, and therefore to good compression ratios. For video (moving pictures) compression schemes, a temporal multiresolution analysis can be used to reduce the redundancy, but it has to be combined with motion estimation (ME) and motion compensation (MC) techniques, in order to take into account large displacements and to improve coding efficiency.
The decomposition process can be represented by a binary tree, as illustrated in FIG. 1 that shows a temporal subband decomposition of the video information. The illustrated 3D wavelet decomposition with motion compensation is applied to a group of frames (GOF), referenced F1 to F8. In this 3D subband decomposition scheme, each GOF of the input video is first motion-compensated (MC), which allows to process sequences with large motion, and then temporally filtered (TF) using Haar wavelets (the dotted arrows correspond to a high-pass temporal filtering, while the other ones correspond to a low-pass temporal filtering). In FIG. 1, three stages of decomposition are shown (L and H=first stage; LL and LH=second stage; LLL and LLH=third stage).
After this spatio-temporal decomposition process, usually data contained in the low frequency subbands present high absolute values. The values tend to decrease when scanning the coefficients toward the highest frequencies. The progressive coding is more efficient if the wavelet coefficients are reordered to obtain groups of coefficients having approximately the same magnitude. By applying this principle, longer runs of zeros and a better compression ratio can be obtained.
Efficient algorithms for creating such groups of coefficients already exist. For instance, the so-called xe2x80x9cEmbedded Zero-tree Wavelet (EZW)xe2x80x9d method provides trees of coefficients with strong correlation at several resolutions. It exploits the fact that, if a wavelet coefficient at a particular spatial resolution and locationxe2x80x94said the xe2x80x9cparentxe2x80x9d coefficientxe2x80x94has a magnitude below a given threshold, its descendants/offsprings (highest resolutions and same spatial location) are very likely to have also a magnitude below this threshold.
Another grouping technique, directly based on the EZW method but using a different grouping process, is presented in xe2x80x9cA new, fast and efficient image codec based on set partitioning in hierarchical trees (SPIHT)xe2x80x9d, by A. Said and W. A. Pearlman, IEEE Transactions on Circuits and Systems for Video Technology, vol.6, nxc2x03, June 1996, pp.243-250. This method is very efficient in clustering zero-valued coefficients at any particular bit-plane and, coupled with an arithmetical coding, is one of the most efficient image compression algorithms actually known. A three-dimensional (3D) application of this algorithm to video sequences is described in xe2x80x9cAn embedded wavelet video coder using three-dimensional set partitioning in hierarchical tree (SPIHT)xe2x80x9d, Proceedings of the Data Compression Conference, Mar. 25-27, 1997, Snowbird, Utah, USA, pp.251-260. Although extremely efficient (this efficiency takes its source in the analysis of the data to be coded), this technique has however one noticeable drawback: the computational complexity of its implementations may be really restrictive. So much time and resources are needed that it would be difficult to use directly said technique for real-time applications or implementations on small, low-cost systems.
Less efficient but with a lower computational complexity, the coding process presented in xe2x80x9cThe Z-coder adaptive coderxe2x80x9d, by L. Bottou and al., Proceedings of Data Compression Conference, Snowbird, Utah, USA, March 30-Apr. 1, 1998, pp.18-32, is another approach to bit-plane encoding. Instead of using trees to exploit parent-offspring relationships and encode a significance map, it uses a simple neighborhood relationship in the spatio-temporal domain. The neighbors, according to the data, are then classified into four different xe2x80x9ctypesxe2x80x9d. These types or groups of coefficients are encoded through a Golomb code based run-length encoder. It can be noticed that entropy coders such as the run-length coder are efficient to code long runs of zeros. Such runs can be generated in a progressive coding process working bit-plane by bit-plane, since two consecutive high magnitude coefficients may be separated by several low magnitude coefficients. However, almost all wavelet coefficients at low frequencies have high magnitude, since the most part of the energy is grouped there. Instead of simply subtracting the mean of the subbands before the coding, it has been proposed, in the European patent application already cited, a more efficient computation scheme introducing a different pulse code modulation (DPCM) to code the subband presenting such a magnitude characteristic.
The efficiency of the techniques introduced above take its source in the analysis of the data to be coded. However, the complexity of the corresponding implementations may be sometimes considered as restrictive.
It is the object of the invention to propose another kind of approach, according to which the coding process is implemented regardless to the data.
To this end, the invention relates to an encoding method as defined in the introductory part of the description and which is moreover characterized in that, for obtaining an encoded bitstream scalable in SNR (signal-to-noise ratio), spatial and temporal resolutions, it comprises the steps of:
(A) organizing the transform coefficients of the spatio-temporal orientation tree in a structure of 3D macroblocks, separated by resolution flags respectively associated to the beginning of each macroblock, and blocks, the size of each block fitting the lowest approximation sub-band which contains all the transform coefficients at the coarsest resolution, and all the blocks within each 3D macroblock being themselves organized in successive two-dimensional (2D) macroblocks belonging to a specific spatial decomposition level and grouped for all the frames of a specific temporal decomposition level;
(B) scanning the coefficients of each 3D macroblock in a predetermined order defined, inside each block, by the spatial orientation of said block and, inside a macroblock, by an association of blocks having the same location in all the frames of a temporal decomposition level;
(C) encoding said scanned coefficients bitplane by bitplane.
The proposed structurexe2x80x94a progressive wavelet three-dimensional encoderxe2x80x94appears to be a satisfying approach for obtaining an embedded scalable video coding scheme, the main functionalities of which will be hereinunder described in a more detailed manner.