The present invention relates to an interframe prediction coding of a image utilized in a tele-conference, and/or a video telephone, in particular, the present invention relates to a hybrid coding system which combines a waveform coding prediction system, and a model-based coding prediction system which uses three dimensional shape model (3-D shape model).
The conventional systems for a coding system for a moving image are a prediction coding system, a transform coding system, and an interframe coding with motion compensated prediction system. Those conventional systems are waveform coding using statistical characteristics of an image.
The motion of an image is detected for each block element in an image, and the area for detection is restricted to the two dimensional plane. On the other hand, those systems have no restriction for an object to be coded. Each block element has for instance 8.times.8 pixels or 16.times.16 pixels.
On the other hand, a model-based coding prediction system which uses a three dimensional shape model of an object is considered to be promising. As it uses a three dimensional shape model, an object to be coded is limited, but it has the advantages that it provides higher information compression ratio as compared with said waveform coding prediction system.
In a model-based prediction system, an object is covered with a mesh which has many triangular cells. The opening of an eye, or a lip, for instance, is presented by the modification of coordinates on said mesh, and is represented by some parameters. In one embodiment, motion of an eye is represented by eight parameters (16 parameters for a pair of eyes), motion of a lip is represented by eight parameters, and motion of a whole head is represented by six parameters, and thus, motion of a head is represented by 30 parameters in total. An image is transmitted by a still image at first, and said parameters for modifying an image. In a preferred embodiment, 15 frames may be transmitted in a second by using 2 kbits of transmission line.
A hybrid coding system which uses both of said systems is now proposed (for instance, IE89-82, pages 13-18, November, 1989, in the Institute of Electronics, Information, and Communication Engineers in Japan, entitled "A hybrid coding method of analysis parameters and waveform in image data compression."
That prior system is described in accordance with FIG. 8.
In FIG. 8, the numeral 21 is an image input terminal, 27 is a frame memory, 22 is a subtractor, 26 is an adder, 32 is a region division analyzer, 30 is a motion compensation portion, 28 is an image synthesizer, 23 is a discrete cosine transform (DCT), 24 is a quantizer, 25 is an inverse quantizer/inverse discrete cosine transform, and 29 is a signal switch.
It uses a motion compensated on prediction in two dimensional plane for each block element, and a discrete cosine transform system as waveform coding prediction system. As a model-based coding system, it uses analysis/synthesis coding system using three dimensional shape model of an object (which is a head of a person in the embodiment). The latter is described in detail in J72-B-I, No. 3, pp. 200-207, in the Institute of Electronics, Information and Communication Engineers in Japan, March, 1989, entitled "A model-based analysis synthesis image coding scheme". The intensity information on surface of a shape model may be either an intensity information of a first frame, or an intensity information of a decoded image of a previous frame. The latter case which uses a decoded image of a previous frame is now described.
An input image at the input terminal 21 is applied to the region division analyzer 32, which divides an image into a background portion and a head portion of a person which is moving, and also analyzes the motion and/or the location of the head portion. The image synthesizer 28 synthesizes an image of a head portion by providing intensity information which is decoded image of a previous frame stored in the frame memory 27, to three dimensional shape model of a head portion according to the information provided by the region division analyzer.
The switch 29 is connected to the contact A when the synthesized portion is further subjected to motion compensated prediction, or to the contact B when that portion is not subjected to motion compensated prediction.
When the switch is connected to the contact A, the motion compensation portion 30 generates the picture of the sum of the background portion of the decoded image of the previous frame stored in the frame memory 27, and the head portion obtained in the synthesizer 28, and provides the predicted image by effecting the motion compensated prediction for an input image by using said sum.
When the switch is connected to the contact B, the motion compensated prediction is carried out for a background portion, and no prediction is carried out for a synthesized head portion.
The adder 31 provides the predicted image which is the sum of the background portion and the head portion. The subtractor 22 provides the prediction error which is the difference between the current input image at the terminal 21 and the predicted image. The obtained prediction error is subjected to discrete cosine transform in the DCT portion 23, and the quantization in the quantizer 24. The quantized result is subjected to encoding. The quantized result is subjected to inverse quantization and inverse discrete cosine transform in the portion 25, the output of which is added to the predicted image in the adder 26. The output of the adder 26 is stored in the frame memory 27.
However, the prior art of FIG. 8 has the following disadvantages.
When the switch 29 is connected to the contact A, the sum of the background portion of the previous frame, and the head portion obtained in the synthesizer 28 is first obtained, and secondly the predicted image is obtained by effecting the motion compensation for said sum image. Therefore, the prediction efficiency of the predicted image for an input image is not high, and the information compression ratio is also not high.
When the switch 29 is connected to the contact B, the prediction efficiency is not high by the similar reason. Further, the intensity of an image is discontinuous at the border of the prediction areas of the waveform coding system and the model-based coding system, as shown in FIG. 9. This results in undesired block noise in a reproduced image.