Very low bit rate, VLBR, coding of video necessitates the ultimate exploitation of implicit correlations present in images sequences in order to achieve targeted bit rates as low as 8 Kbits/sec, kbps. One of the most obvious correlations that all of today's codecs incorporate is the correlation of the intensity fields of two consecutive frames, i.e., the graceful motion of objects from one frame to another. The two most popular approaches to motion compensation are block-matching, BM, and spatio-temporal gradient techniques. While BM methods assume that all pixels in a given block move the same way, spatio-temporal gradient motion estimation algorithms minimize the displaced frame difference, DFD, at each pixel based on an initial estimate of the displacement vector field, DVF. One of the main differences between the two ME algorithms is that spatio-temporal gradient algorithms provide a dense vector field, i.e., one vector per pixel, while BM algorithms provide a coarse vector field, i.e., one vector per a block of pixels.
One of the important existing challenges of any type of motion compensation is the accurate representation and compensation of information in the occluded areas of the image. Occluded areas are defined as the areas a) where the background part of the scene is covered by a moving object, and b) where the background part of the scene is uncovered by a moving object. The two problems associated with occluded areas are the detection and the efficient transmission of information in these areas. Most of the existing codecs today fail in these areas due to either inaccurate detection or inefficient encoding of the information in these areas. For instance, motion compensated discrete cosine transform (DCT) coding schemes, such as those specified by H.261, MPEG1, and MPEG2, take the brute force approach in which the occluded areas are simply encoded as part the DFD. However, since the characteristics of these areas are quite different than that of the typical motion compensation failures, such as the failures around moving edges, the occluded areas are transmitted inefficiently in terms of both the bit rate and the reconstructed image quality. On the other hand, object-based coding schemes take the approach of detecting and encoding the information in the occluded areas as a separate process from the DFD encoding process. These schemes, however, suffer from the inefficient encoding of information since they require the transmission of the position overhead for these areas. This overhead can be very costly in achieving the target bit rates.
Currently, few approaches for the detection and transmission of the occluded area information in the context of very low bit rate coding have been reported. Most of the existing techniques take the approach of segmenting the scene into four regions; stationary background, moving object, covered background and uncovered background. According to these approaches, first a change detection mask is obtained based on two consecutive frames. Then motion estimation is performed resulting in a motion model for each object that is identified. Based on the change detection mask and the motion model the uncovered and covered regions are separated from the moving objects. However, these approaches have several disadvantages. First, either the position information of the detected occluded areas or the change detection mask along with the motion vectors must be transmitted over the channel in order for the decoder to generate the occlusion information according to the proposed algorithm. The position information, however, can be very costly depending on the amount of motion from one frame to another. Second, the proposed algorithms fail in detecting the uncovered areas in cases of sub-pixel motion since it simply considers the reverse motion in order to detect these areas. In order to get around that problem, for instance, backward motion estimation, i.e. from the current frame to the previous frame has been proposed; however, this process simply increases the computational expense substantially. Third, no consideration of the temporal continuity of motion between successive frames is incorporated into these algorithms. Fourth, such techniques tend to break down considerably in the presence of noise. Finally, these approaches make use of the intensity fields instead of the DVFs. However, this in turn can be the limiting factor in the performance of these approaches, since the characteristics of the occluded areas are captured more efficiently with the DVFs than the intensity fields due to the fact that occluded areas are inherently caused by inefficiencies in motion estimation and compensation.
Thus, there is a need for a method and apparatus for the accurate detection of occluded areas based on the DVF to achieve the desirable reconstructed image quality at targeted very low bit rates.