This application relates to U.S. Pat. No. 5,710,829 filed by the same inventors on Apr. 25, 1995, which is hereby incorporated by reference as if repeated herein in its entirety.
The present invention relates generally to video coding and more particularly to video coding in which the image is decomposed into objects prior to coding. Each of the individual objects is then coded separately.
For many image transmission and storage applications, significant data compression may be achieved if the trajectories of moving objects in the images are successfully estimated. Traditionally, block-oriented motion estimation has been widely investigated due to its simplicity and effectiveness. However, block and object boundaries in a scene normally may not coincide because the blocks are not adapted to the image contents. This can lead to visible distortions in low bit rate-coders, known as blurring and mosquito effects.
Object-oriented coding techniques were developed to overcome the disadvantages of block-oriented coding. In one type of object-oriented coding, the image sequence is segmented into moving objects. Large regions with homogeneous motion can be extracted, resulting in higher compression and reduced motion boundary visible distortions. As the foreground objects carry more new information relative to the slowly changing background, the background can be transmitted less frequently than the foreground. Consequently, the foreground objects must be correctly identified to achieve the desired compression levels without adding undue distortion.
As a result, segmentation is an important intermediate step in object-oriented image processing. For this reason, many approaches to segmentation have been attempted, such as motion-based, focus-based, intensity-based, and disparity-based segmentation. The problem with each of these approaches is their feature specificity, which limits the scenes to which they are successfully applied. For example, the scene must contain motion for motion-based segmentation to be applicable. The scene must contain significant contrast to supply intensity-based segmentation. Similar features are required for the other approaches. In addition, the motion-based approach fails for scenes containing both foreground and background motion, such as moving foreground shadows cast onto the background. The focus-based approach also fails when the foreground is blurred. The intensity-based approach fails for textured objects because a single object erroneously segments into multiple objects. And the measurement of disparity in the disparity-based approach is complex and error-prone.
One technique is to use a priori knowledge about the images to select the coding method, which overcomes this problem. However, this makes image coding inconvenient in that processing must include a determination of the type of image and then a selection of the most appropriate coding type for that image. This significantly increases preprocessing costs of the images prior to coding. Alternatively, a lower quality coding must be employed. Unfortunately, neither of these alternatives is acceptable as bandwidth remains limited for image transmission and consumers expect higher quality imagery with increased technology.
The issue then becomes how to accentuate the strengths of these methods and attenuate their failings in foreground and background segmentation. Several possibilities have been examined. One approach combines motion and brightness information into a single segmentation procedure which determines the boundaries of moving objects. Again, this approach will not work well because the moving background will be segmented with the moving foreground and therefore classified and coded as foreground.
Another approach uses a defocusing and a motion detection to segment a foreground portion of the image from a background portion of the image. This process is shown in FIGS. 7-9. FIG. 7 shows the process, FIG. 8 shows the segmentation results over several frames, and FIG. 9 shows the results of the defocus measurement. However, this approach requires a filling step to the process. Filling is a non-trivial problem, especially where the foreground image segment output by this process results in objects without closed boundaries. In this case, significant complexity is added to the overall process. Given the complexity inherent in video coding, the elimination of any complex step is significant in and of itself.
The present invention is therefore directed to the problem of developing a method and apparatus for segmenting foreground from background in an image sequence prior to coding the image, which method and apparatus requires no a priori knowledge regarding the image to be segmented and yet is relatively simple to implement.