This invention relates to mosaic generation and sprite-based coding, and more particularly, to sprite-based coding with automatic foreground and background segmentation. Throughout the document, the terms xe2x80x9cspritexe2x80x9d and xe2x80x9cmosaicxe2x80x9d will be used interchangeably.
Dynamic sprite-based coding can use object shape information to distinguish objects moving with respect to the dominant motion in the image from the rest of the objects in the image. Object segmentation may or may not be available before the video is encoded. Results of sprite-based coding with a priori object segmentation increases coding efficiency at sufficiently high bit rates where segmentation information, via shape coding, can be transmitted.
When object segmentation is available and transmitted, sprite reconstruction uses the dominant motion of an object (typically, a background object) in every video frame to initialize and update the content of the sprite in the encoder and decoder. Coding efficiency improvements come from scene re-visitation, uncovering of background, and global motion estimation. Coding gains also come from smaller transmitted residuals as global motion parameters offer better prediction than local motion vectors in background areas. Less data is transmitted when a scene in revisited or background is uncovered because the uncovered object texture has already been observed and has already been incorporated into the mosaic sometime in the past. The encoder selects the mosaic content to predict uncovered background regions or other re-visited areas. Coding gains come from the bits saved in not having to transmit local motion vectors for sprite predicted macroblocks.
However, the segmentation information may not be available beforehand. Even when available, it may not be possible to transmit segmentation information when the communication channel operates at low bit rates. Shape information is frequently not available since only a small amount of video material is produced with blue screen overlay modes. In these situations, it is not possible to distinguish among the various objects in each video frame. Reconstruction of a sprite from a sequence of frames made of several video objects becomes less meaningful when each object in the sequence exhibits distinct motion dynamics. However, it is desirable to use dynamic sprite-based coding to take advantage of the coding efficiency at high bit rates and if possible, extend its performance at low bit rates as well. Shape information takes a relatively larger portion of the bandwidth at low bit rate. Thus, automatic segmentation provides a relatively larger improvement in coding efficiency at low bit rates.
Current sprite-based coding in MPEG-4 assumes that object segmentation is provided. With the help of segmentation maps, foreground objects are excluded from the process of building a background panoramic image. However, the disadvantage of this approach is that object segmentation must be performed beforehand. Object segmentation is a complex task and typically requires both spatial and temporal processing of the video to get reliable results.
Temporal linear or non-linear filtering is described in U.S. Pat. No. 5,109,435, issued Apr. 28, 1992, entitled Segmentation Method for Use Against Moving Objects to Lo, et al. Temporal filtering is used for segmenting foreground objects from background objects for the purpose of reconstructing image mosaics. This approach has two disadvantages: First, it requires that several frames be pre-acquired and stored so temporal filtering can be performed. Second, it does not explicitly produce a segmentation map, which can be used to refine motion estimates.
Analysis of motion residuals is described in U.S. Pat. No. 5,649,032, issued Jul. 15, 1997, entitled System for Automatically Aligning Images to Form a Mosaic Image, to Burt, et al. This method separates foreground objects from background objects in a mosaic but does not reconstruct a mosaic representative of the background object only (see description in the Real time transmission section). Post-processing must be used to eliminate the foreground objects.
Accordingly, a need remains for automatically performing on-line segmentation and sprite building of a background image (object undergoing dominant motion) when prior segmentation information is neither available nor used due to bandwidth limitations.
Automatic object segmentation generates high quality mosaic (panoramic) images and operates with the assumption that each of the objects present in the video scene exhibits dynamical modes which are distinct from the global motion induced by the camera. Image segmentation, generation of a background mosaic and coding are all intricately linked. Image segmentation is progressively achieved in time and based on the quality of prediction signal produced by the background mosaic. Consequently, object segmentation is embedded in the coder/decoder (codec) as opposed to being a separate pre or post-processing module, reducing the overall complexity and memory requirements of the system.
In the encoder, foreground and background objects are segmented by first encoding and decoding a first image at a first time reference. The method used to encode and decode this first image does not need to be specified for the purpose of this invention. The second image at a second time reference is divided into non-overlapping macroblocks (tiles). The macroblocks are matched to image sample arrays in the decoded first image or in the mosaic. In the first case, the encoder uses local motion vectors to align an individual macroblock with one or several corresponding image sample array in the previous decoded image. In the second case, the encoder uses parameters of a global motion model to align an individual macroblock with a corresponding mosaic sample array. The encoder evaluates the various residuals and selects the proper prediction signal to use according to a pre-specified policy. This decision is captured in the macroblock type. The macroblock types, the global motion parameters, the local motion vectors and the residual signals are transmitted to the decoder.
Frame residuals represent the difference between the macroblocks and corresponding image arrays in the previously decoded image matched by using local motion vectors. Macroblocks having a single local motion vector are identified as INTER1V-type macroblocks. Macroblocks having multiple (4) local motion vectors are identified as INTER4V-type macroblocks. INTER4V macroblocks are always labeled as foreground. INTER1V macroblocks can either be labeled foreground or background.
A global motion model representing camera motion between the first and second image is applied to the macroblocks in the second image. The global vector maps the macroblocks to a corresponding second image sample array in the first decoded image. Global residuals between the macroblocks and the second image array are derived. When the global residuals are greater than the INTER1V frame residuals, the macroblocks are classified as foreground. When the INTER1V frame residuals are greater than the global residuals, the macroblocks are classified as background. By comparing the global residuals to the INTER1V frame residuals derived from the previously decoded image the mosaic can be automatically updated with the image content of macroblocks likely to be background.
Mosaic residuals represent the difference between the macroblocks and corresponding global motion compensated mosaic arrays. Any macroblocks tagged as mosaic prediction type are classified as background.
A segmentation map can be used to classify the macroblocks as either foreground or background. A smoothing process is applied to the segmentation map to make foreground and background regions more homogeneous. The mosaic is then updated with the contents of macroblocks identified as background in the smoothed segmentation map.
Automatic segmentation does not require any additional frame storage and works in a coding and in a non-coding environment. In a non-coding environment, the invention operates as an automatic segmentation-based mosaic image reconstruction encoder. Automatic object segmentation builds a mosaic for an object exhibiting the most dominant motion in the video sequence by isolating the object from the others in the video sequence and reconstructing a sprite for that object only. The sprite becomes more useable since it is related to only one object. The results of the auto-segmentation can be used to obtain more accurate estimates of the dominant motion and prevent the motion of other objects in the video sequence from interfering with the dominant motion estimation process.
Automatic object segmentation can be integrated into any block-based codec, in particular, into MPEG4 and is based on macroblock types and motion compensated residuals. Dominant motion compensation is used with respect to the most recently decoded VO plane. A spatial coherency constraint is enforced to maintain the uniformity of segmentation. Automatic segmentation is used in a non-coding environment, for example in the context of building a background image mosaic only (or region undergoing dominant motion) in the existence of foreground objects. Thus, automatic sprite-based segmentation is not only useful for on-line dynamic sprites but can also be used in generating an off-line (e.g., background) sprite that can be subsequently used in static sprite coding.
The foregoing and other objects, features and advantages of the invention will become more readily apparent from the following detailed description of a preferred embodiment of the invention, which proceeds with reference to the accompanying drawings.