This invention relates to mosaic generation and sprite-based coding, and more particularly, to sprite-based coding with automatic foreground and background segmentation. Throughout the document, the terms "sprite" and "mosaic" will be used interchangeably.
Dynamic sprite-based coding can use object shape information to distinguish objects moving with respect to the dominant motion in the image from the rest of the objects in the image. Object segmentation may or may not be available before the video is encoded. Results of sprite-based coding with apriori object segmentation increases coding efficiency at sufficiently high bit rates where segmentation information, via shape coding, can be transmitted.
When object segmentation is available and transmitted, sprite reconstruction uses the dominant motion of an object (typically, a background object) in every video frame to initialize and update the content of the sprite in the encoder and decoder. Coding efficiency improvements come from scene re-visitation, uncovering of background, and global motion estimation. Coding gains also come from smaller transmitted residuals as global motion parameters offer better prediction than local motion vectors in background areas. Less data is transmitted when a scene in revisited or background is uncovered because the uncovered object texture has already been observed and has already been incorporated into the mosaic sometime in the past. The encoder selects the mosaic content to predict uncovered background regions or other re-visited areas. Coding gains come from the bits saved in not having to transmit local motion vectors for sprite predicted macroblocks.
However, the segmentation information may not be available beforehand. Even when available, it may not be possible to transmit segmentation information when the communication channel operates at low bit rates. Shape information is frequently not available since only a small amount of video material is produced with blue screen overlay modes. In these situations, it is not possible to distinguish among the various objects in each video frame. Reconstruction of a sprite from a sequence of frames made of several video objects becomes less meaningful when each object in the sequence exhibits distinct motion dynamics. However, it is desirable to use dynamic sprite-based coding to take advantage of the coding efficiency at high bit rates and if possible, extend its performance at low bit rates as well. Shape information takes a relatively larger portion of the bandwidth at low bit rate. Thus, automatic segmentation provides a relatively larger improvement in coding efficiency at low bit rates.
Current sprite-based coding in MPEG-4 assumes that object segmentation is provided. With the help of segmentation maps, foreground objects are excluded from the process of building a background panoramic image. However, the disadvantage of this approach is that object segmentation must be performed beforehand. Object segmentation is a complex task and typically requires both spatial and temporal processing of the video to get reliable results.
Temporal linear or non-linear filtering is described in U.S. Pat. No. 5,109,435, issued Apr. 28, 1992, entitled Segmentation Method for Use Against Moving Objects to Lo, et al. Temporal filtering is used for segmenting foreground objects from background objects for the purpose of reconstructing image mosaics. This approach has two disadvantages: First, it requires that several frames be pre-acquired and stored so temporal filtering can be performed. Second, it does not explicitly produce a segmentation map, which can be used to refine motion estimates.
Analysis of motion residuals is described in U.S. Pat. No. 5,649,032, issued Jul. 15, 1997, entitled System for Automatically Aligning Images to Form a Mosaic Image, to Burt, et al. This method separates foreground objects from background objects in a mosaic but does not reconstruct a mosaic representative of the background object only (see description in the Real time transmission section). Post-processing must be used to eliminate the foreground objects.
Accordingly, a need remains for automatically performing on-line segmentation and sprite building of a background image (object undergoing dominant motion) when prior segmentation information is neither available nor used due to bandwidth limitations.