1. Field of the Invention
The present invention relates to a prediction apparatus and method for improving coding efficiency in scalable video coding which is very useful in the multiple layer video coding, especially in spatial scalability video coding, where information from base band is always available to be used for coding its enhancement band signal.
2. Description of the Prior Art
In single layer video coding, motion estimation and compensation is used to reduce the redundancy between temporally adjacent frames. In the existing standard of MPEG-1. MPEG-2, and H.263 there are forward, backward, interpolated, as well as direct mode of motion estimation and compensation. Regardless of the mode selected based on the mode decision method, the final objective is to obtain th most accurate prediction that can be represented by the minimum number of bits. In the case of the single layer coding, there is no other information to be used for prediction except the adjacent frames in the same sequence. In multiple layer video coding, the prediction in the enhancement band can use information from both the enhancement band and its base band.
In the development of MPEG-4 standard, she spatial scalability video coding was introduced, see MPEG-4 Video Verification Model Version 4.0 [1], where the structure of the spatial scalability is set up, shown in FIG. 1.
As shown in FIG. 1, P-VOP(video object plane) in the base layer is predicted from its immediate previous VOP, P-VOP in the enhance layer is predicted from I-VOP in the base layer, and B-VOP in the enhance layer is predicted from its immediate previous P-VOP or B-VOP, called forward mode, from the P-VOP with the same time reference in the base layer, called backward mode, as well as from the averaging of these two, called interpolated mode.
As we know, the prediction from base layer to its enhance layer, is affected by an up-sampling filter. No matter how well the filter is designed, the up-sampled P-VOP is always short of some information compared with the previous P-VOP in the enhance layer. It was also found in our experiments that when the forward mode, backward mode, and interpolated mode are used for the prediction of B-VOP in the enhance layer, for motion image parts forward mode was almost chosen, and for still image parts interpolated mode was almost chosen. That is to say, a very small percentage is for backward mode, shown in FIG. 2. FIG. 2 shows a prediction result using a typical test sequence (container ship) according to the prior art, and in FIG. 2, "F" represents the forward mode, "I" represents the interpolated mode, and "B" represents the backward mode. The total bit used is 4096, SNRY is 30.94, SNRU is 38.70, and SNRV is 38.36.
If we check the coding efficiency for this coding scheme, for interpolated mode we have to code and transmit both forward and backward motion vectors, which consume a lot of bits, especially in the case of low bit rate coding. On the other hand, for still image parts we should not code and transmit anything, if we can tell the decoder which parts belong to still image parts.
From the above, it is clear that the prior art does not provide an efficient coding for B-VOP in the enhance layer. The up-sampled P-VOP is not very reliable as the predictor for B-VOP in backward mode. This also means that the interpolated mode resulting from backward and forward mode is also unreliable. It is therefore necessary to introduce or design another prediction mode besides the forward mode for prediction of moving parts of the image.
The new prediction mode disclosed in this invention handles the coding of non moving parts of the image or VOP. Furthermore the new mode improves the coding efficiency of the overall multiple layer coding scheme by reducing the amount of residual information that is coded and also by inferring as much information as possible from the base layer.
In this invent-on, one more prediction mode will be added into the existing three modes: forward, backward, and interpolated mode. The forward mode addresses the coding of moving parts or VOPs, the new mode addresses the coding of non moving parts or VOPs, and interpolated and backward modes address the uncertain parts which cannot be predicted very well by forward mode nor the new mode.