One of the most effective video compression algorithms is based on a three-dimensional (2D+t) processing of the concerned video sequence: the redundancy in the video information is reduced by performing a separable 3D wavelet transform (the main difference with a predictive approach being the fact that the temporal axis is processed as the spatial ones). The efficiency of this approach is improved when a motion compensation of the group of frames (GOF) considered in the temporal filtering is applied previous to the filtering. Unlike the spatial decomposition, which can benefit from long filters, the best choice for temporal filtering turns out to be the so-called Haar multiresolution analysis, because it introduces no boundary problems and a minimal delay.
The 3D wavelet decomposition is therefore applied to each GOF in the sequence. The size of the group is chosen in order to trade-off the delay in reconstruction (important in real-time applications such as videoconference) and the efficiency of the subsequent coding algorithm. When a 3D-SPIHT algorithm such as described in “An embedded wavelet video coder using three-dimensional set partitioning in hierarchical trees (SPIHT)” by B. J. Kim and W. A. Pearlman, Proceedings of Data Compression Conference, Snowbird, Utah, USA, 1997, pp.251-257, is applied to the decomposed GOF, a sufficient number of decomposition levels must exist in order to construct the spatio-temporal trees on which the algorithm is based. In practice, a number of 16 frames in the GOF is a good choice for most sequences.
This 3D-SPIHT algorithm may be applied to grey video sequences, but the processing of color sequences raises the problem of embedding the color in the same bitstream. Considering a tri-stimulus color space such as YUV, with luminance Y and chrominance planes U and V in the 4:2:0 format, a simple method to cope with the coding problem of color video would be to code each color plane separately as it is done by a conventional color video coder. This technique however fails to provide an embedded bitstream, since it requires a bit-allocation strategy among color planes. Moreover, the color planes bitstreams are concatenated and the receiver has to wait until the entire bitstream arrives in order to reconstruct the video and to display it.
According to another solution, all color planes may be treated as one unit at the coding stage and one mixed bitstream may then be generated, so that one can stop at any point in the reconstruction and display the color video at the given bit-rate. This solution proceeds by separately performing a 3D wavelet decomposition with the same number of levels on each color plane. Then, to code all planes together, the LIP and LIS defined in SPIHT are initialized with the appropriate coordinates of the top level in all the three planes.
In the previously described approach, each color plane has its own spatio-temporal orientation tree, but the Y-, U-and V-trees are mutually exclusive. In the mean-time, when using the 4:2:0 format, the differences of sizes between Y-, U- and V-planes strongly impact the possibility of performing the same multiresolution analysis and also the coding efficiency of the subsequent SPIHT algorithm. Indeed, the problem which appears is that even if the original format of the video (CIF or QCIF) allows for a certain number of resolution levels when considering the luminance plane (for example, the QCIF format, 176×144 allows for 4 resolution levels), for the chrominance, which is already in a subsampled format, a level less must be done in the decomposition. On the other hand, as the SPIHT encoding only works well with subbands of even sizes, then only 3 levels are possible for the luminance plane. In what concerns the acceptable number of decomposition levels of the chrominance planes, two strategies are possible:
the same number of resolution levels is considered for the chrominance multiresolution analysis, which leads to odd-sized subbands at the lowest resolution level (therefore the original SPIHT algorithm cannot cope with this strategy without any adaptation); the appropriate number of decomposition levels is chosen for each color plane, such that the SPIHT algorithm applies directly.