1. Field of the Invention
The present invention relates to a method and apparatus for predictive coding, storage, and transfer of digital images, and to an apparatus for signal decoding and image synthesizing, and relates particularly to a method and apparatus for predictive coding, storage, and transfer of digital images obtained by synthesizing plural object-based image layers, and to an apparatus for signal decoding and resynthesizing said object-based images.
2. Description of the Prior Art
J. Wang and E. Adelson have proposed a method for decomposing moving images into object-based layers (a different layer for each object in the image) for coding as a means of efficiently transmitting and recording moving images. Their method is described in xe2x80x9cLayered Representation of Image Sequence Coding,xe2x80x9d J. Wang and E. Adelson, Proc. IEEE Int. Conf. Acoustic Speech Signal Processing, 1993, pp. V221-V224; and in xe2x80x9cLayered Representation for Motion Analysis,xe2x80x9d J. Wang and E. Adelson, Proc. Computer Vision and Pattern Recognition, pp. 361-366, 1993.
This method is described below, assuming a motion picture containing fish, seaweed, and background. After each of the three elements composing this motion picture is separately recorded as a chromakey image, the elements can be synthesized into a single composite image, or a sequence of plural image frames bracketing a particular image can be analyzed and separated into the three component parts. The method proposed by Wang et al., analyzes the motion picture sequence to extract the fish, seaweed, and background, and separate each of these objects to a discrete layer. Each layer is then separately compression coded. The coded data is then multiplexed by a multiplexer, adding information identifying the vertical relationship between the layers (i.e., which layer overlays which layer), for storage or transmission. When it is also necessary to identify the relative transparency of the pixels or shapes in each frame is layer, a transmittance signal is also coded and transmitted or stored with the object layers.
Plural decoders are similarly required on the reproduction side. The multiplexed data is first demultiplexed into the separate layers, and the data in the separate layers is then simultaneously decoded by the respective decoders. The objects in the reproduced layers are then overlaid by the image synthesizer based on the vertical layer relationship data to generate the reproduction (synthesized) image then displayed on the display apparatus.
To improve the compression efficiency, an image representative of each layer (e.g., the fish, seaweed, or background scene) is determined and used as a reference image for that layer. Note that the template for each layer is selected to be most representative of the object sequence in that layer. These templates may be selected manually or automatically, and Wang et al., also describe a method for automatically generating these templates.
The defined templates are compression coded first, and each object in the layer is coded by predictive approximation based on the displacement or deformation of the generated template. By separating the motion picture into component objects, those objects can be more accurately approximated without being influenced by other nearby objects. Objects at a sufficient distance from the camera can also be treated as still objects, making it possible to describe the deformation and displacement (change and movement) in such objects with few parameters. More specifically, an affine transformation is used to approximate a single object using six parameters, thereby requiring few bits and achieving an extremely high compression rate.
The method approximating objects using templates as described above, however, results in increased error when there is a large change in object shape or luminance, and degrades the compression coding efficiency. In addition to approximation using templates, it is therefore necessary to predict such changes using the images displayed chronologically before and after the image to be coded, and to adaptively select the optimum predictive image.
Discrete predictive coding of each object also commonly results in a mismatch between the contours of the coded object and the predictive object. Contour mismatches increase the difference value, and prevent the efficient coding of the contour luminance and color difference signals.
Furthermore, the reproduction side must have three decoders to decode the objects in three layers as described above. As a result, the number of reproducible layers is limited by the number of decoders available to the reproduction side. A frame memory with sufficient capacity to store the decoder output is also needed to synthesize the reproduced objects in each layer, and the number of frame memory units is proportional to the number of layers. As the number of image layers increases, the overall size and cost of the decoder increase greatly.
Synthesis of the output image according to the vertical relationship data of the layers also prevents selective display of the layers, and prevents a selected layer from being moved from the coded position to a position behind or in front of another layer. More specifically, interactivity is impaired.
Therefore, an object of the present invention is to prevent deterioration of the predictive image as a result of large changes in the shape or luminance of an object in a predictive coding method using templates, and to reduce the accumulation of prediction error over time.
To achieve this, the present invention converts the image to be coded using a particular transformation method to generate predictive images from at least one template and the image displayed chronologically before or after the image to be coded, and uses the predictive image with the least difference to the image to be coded as the optimum predictive image for that image.
In addition, a new predictive image generated by averaging plural predictive images is added to the candidate predictive images from which the predictive image with the least difference is selected as the optimum predictive image for the image to be coded.
Furthermore, the optimum predictive image is divided into plural predictive subareas, that is, the image to be coded is divided into plural target subareas. For each target subarea in which there is at least one pixel value that should not be coded and the corresponding predictive subarea, the pixel values to be coded in the corresponding predictive subarea are then operated on using a known function to calculate a substitute pixel value, and this substitute pixel value is then substituted into the target subarea and corresponding predictive subarea for the pixel values therein that should not be coded. The difference signal is then obtained from the target and predictive subareas containing the substitute pixel value.
The second object of the present invention is to suppress an increase in the difference value caused by a mismatch between the contours of the target object to be coded and the predictive object.
To achieve this object, the predictive image and the target image to be coded are divided into plural predictive subareas and target subareas, respectively. Before the difference between corresponding predictive and target subareas is obtained, the pixel values to be coded in the predictive subarea are operated on using a known function to calculate a substitute pixel value for the corresponding target subarea in which there is at least one pixel value that should not be coded. This calculated substitute pixel value is then substituted into the target subarea and the corresponding predictive subarea for each pixel value therein not to be coded. The difference between the target subarea and predictive subarea is then calculated after making this pixel value substitution.
The third object of the present invention is to achieve a system whereby an image can be reproduced by means of a single decoder irrespective of the number of object layers composing the image, the image can be synthesized using a frame memory that is not dependent upon the number of object layers, and layers can be selectively reproduced, thus enabling high interactivity with the user.
The present invention uses an apparatus for decoding and synthesizing digital images composed of plural superimposed image layers where said digital images are coded by separately compression coding each of the plural image layers, and then multiplexing the layers in a predetermined order. Preferably, this coded data is multiplexed in sequence from either the background layer or the foreground layer.
Said decoding and synthesizing apparatus according to the present invention comprises an external line input terminal, decoder, synthesizer, frame memory, and output means. The coded data is input to the external line input terminal, and each layer is then decoded to a reconstructed image in the sequence in which the coded data was multiplexed. This reconstructed image and the synthesized image supplied from the frame memory are input to the synthesizer, which generates a new synthesized image by merging the synthesized image and the reconstructed image. This new synthesized image is stored to the frame memory, and displayed by the output means.
A first selector switch is also disposed between the external line input terminal and the decoder of the above decoding and synthesizing apparatus, and is controlled to not connect the external line input terminal and decoder when the image of a layer that is not reproduced is input. In addition to the first selector switch, a second selector switch is disposed between the synthesizer and frame memory, and is controlled to not connect the synthesizer and frame memory when the image stored in the frame memory is not to be updated.