1. Field of the Invention
The present invention relates to a method and apparatus for predictive coding, storage, and transfer of digital images, and to an apparatus for signal decoding and image synthesizing, and relates particularly to a method and apparatus for predictive coding, storage, and transfer of digital images obtained by synthesizing plural object-based image layers, and to an apparatus for signal decoding and resynthesizing said object-based images.
2. Description of the Prior Art
J. Wang and E. Adelson have proposed a method for decomposing moving images into object-based layers (a different layer for each object in the image) for coding as a means of efficiently transmitting and recording moving images. Their method is described in "Layered Representation of Image Sequence Coding," J. Wang and E. Adelson, Proc. IEEE Int. Conf. Acoustic Speech Signal Processing, 1993, pp. V221-V224; and in "Layered Representation for Motion Analysis," J. Wang and E. Adelson, Proc. Computer Vision and Pattern Recognition, pp. 361-366, 1993.
This method is described below, assuming a motion picture containing fish, seaweed, and background. After each of the three elements composing this motion picture is separately recorded as a chromakey image, the elements can be synthesized into a single composite image, or a sequence of plural image frames bracketing a particular image can be analyzed and separated into the three component parts. The method proposed by Wang et al., analyzes the motion picture sequence to extract the fish, seaweed, and background, and separate each of these objects to a discrete layer. Each layer is then separately compression coded. The coded data is then multiplexed by a multiplexer, adding information identifying the vertical relationship between the layers (i.e., which layer overlays which layer), for storage or transmission. When it is also necessary to identify the relative transparency of the pixels or shapes in each frame layer, a transmittance signal is also coded and transmitted or stored with the object layers.
Plural decoders are similarly required on the reproduction side. The multiplexed data is first demultiplexed into the separate layers, and the data in the separate layers is then simultaneously decoded by the respective-decoders. The objects in the reproduced layers are then overlaid by the image synthesizer based on the vertical layer relationship data to generate the reproduction (synthesized) image then displayed on the display apparatus.
To improve the compression efficiency, an image representative of each layer (e.g., the fish, seaweed, or background scene) is determined and used as a reference image for that layer. Note that the template for each layer is selected to be most representative of the object sequence in that layer. These templates may be selected manually or automatically, and Wang et al., also describe a method for automatically generating these templates.
The defined templates are compression coded first, and each object in the layer is coded by predictive approximation based on the displacement or deformation of the generated template. By separating the motion picture into component objects, those objects can be more accurately approximated without being influenced by other nearby objects. Objects at a sufficient distance from the camera can also be treated as still objects, making it possible to describe the deformation and displacement (change and movement) in such objects with few parameters. More specifically, an affine transformation is used to approximate a single object using six parameters, thereby requiring few bits and achieving an extremely high compression rate.
The method approximating objects using templates as described above, however, results in increased error when there is a large change in object shape or luminance, and degrades the compression coding efficiency. In addition to approximation using templates, it is therefore necessary to predict such changes using the images displayed chronologically before and after the image to be coded, and to adaptively select the optimum predictive image.
Discrete predictive coding of each object also commonly results in a mismatch between the contours of the coded object and the predictive object. Contour mismatches increase the difference value, and prevent the efficient coding of the contour luminance and color difference signals.
Furthermore, the reproduction side must have three decoders to decode the objects in three layers as described above. As a result, the number of reproducible layers is limited by the number of decoders available to the reproduction side. A frame memory with sufficient capacity to store the decoder output is also needed to synthesize the reproduced objects in each layer, and the number of frame memory units is proportional to the number of layers. As the number of image layers increases, the overall size and cost of the decoder increase greatly.
Synthesis of the output image according to the vertical relationship data of the layers also prevents selective display of the layers, and prevents a selected layer from being moved from the coded position to a position behind or in front of another layer. More specifically, interactivity is impaired.