The technology for digitizing images into digital image data is dramatically spreading and developing because digital data is easy to record, transmit, edit, copy and transfer. One of the advantages of digitization is the possibility of facilitating data compression. Compression coding is an important technology for data recording and data transmission. The compression coding technology has the established international standards, especially one of which is the MPEG standard that has spread as a general digital standard which can process both video and audio.
The compression coding of digital images processes image data comprising a series of digitized still pictures. In general, the compression coding has the two ways, one of which is an intra-frame coding which compresses a frame (corresponding to a picture) of still picture according to the spatial correlation (the correlation in a frame) while removing redundancy, and the other of which is a inter-frame coding which compresses frames of still pictures which are temporally close to each other, for example, temporally serial frames of pictures, according to the temporal correlation (the correlation among frames) while removing redundancy.
A prior art image coding based on MPEG and the like usually uses the intra-frame coding. If the inter-frame coding is carried out as well, the coded data has a high compression rate. To carry cut the inter-frame coding, a decoding process which is a converse process of coding and motion detection and motion compensation processes are carried out to generate a predicted image and then a difference between an image to be coded arid the predicted image is calculated using the predicted image as a reference image. Thus the decoding process, and the motion detection and motion compensation processes adversely increase the process load for an apparatus. However, the difference is small when the predicted image has preferable precision, it is possible to increase the coding efficiency by coding the difference more than by coding the image to be coded itself.
As the prediction method employed when the inter-frame coding are carried out, there are some methods, namely a forward prediction based on data which is located at a forward position on time series from the data of an image to be coded in a series of still pictures, a backward prediction based on data which is located at a backward position, and a bidirectional prediction based on data which are located at a forward and backward positions. In general, the intra-frame coding is represented as `I`, the forward predictive coding is represented as `P`, and the bidirectional predictive coding (including the backward predictive coding) is represented as `B`.
When only the intra-frame coding is carried out, or when the forward predictive coding as well as the intra-frame coding are carried out, a series of still pictures to be coded can be processed simply according to time series. As opposed to this, when the backward or bidirectional prediction is carried out, the data which is located at a backward position on time series must be first coded. Therefore, in general, when the inter-frame coding is carried out as well, it is determined in advance which each frame constituting the image data to be coded, an I frame to be subjected to the intra-frame coding or a P frame which can be subjected to the forward predictive coding or a B frame which can be subjected to the bidirectional predictive coding. If the data to be processed is an I frame, the data is subjected to the intra-frame coding. If the data to be processed is a P frame or B frame, the data is subjected to the intra-frame coding or the inter-frame coding. When this coding process is carried out, it is possible to predetermine the ratio of the I frame and the P frame and the B frame according to the purpose of the result of the coding and to the like in the coding apparatus.
FIG. 14 is a diagram for explaining the intra- and inter-frame coding processes of the prior art. In the figure, numerals 1400 to 1406 each designate a frame of image data constituting an image data to be coded. Numerals t0 to t6 designate the respective times. The order of the times t0 to t6 indicates the course of time series. In the frames 1400 to 1406, the frame 1400 is an T frame, the frames 1403 and 1406 are P frames, and the frames 1401, 1402, 1404 and 1405 are B frames.
Arrows shown in the figure designate the reference relationships of each frame in the coding process. The frame 1400 which is an I frame is subjected to the intra-frame coding without referring to any other frame. The frame 1403 which is a P frame can be coded referring to the frame 1400 which is located at a forward position on time series. The frame 1401 which is a B frame can be coded referring to the frame 1400 which is located at a forward position on time series and/or the frame 1403 which is located at a backward position on time series.
For that reason, as described above, the frame 1403 must be coded earlier than the frames 1401 and 1402 which are located at a forward position in the frame 1403, and the I frame and the P frame are given priority to be coded earlier than the B frame. Further, no frames are coded referring to the B frame.
When the bidirectional predictive coding is additionally carried out for the coding process, the apparatus can decide whether the B frame is subjected to the inter-frame coding referring to a forward and backward frames, or the B frame is subjected to either a forward frame, a backward frame, or both frames which are selected as reference frames, or the intra-frame coding is an option as well.
As described above, the inter-frame coding, particularly when the bidirectional predictive coding is carried out as well, contributes to an increase in the process load and requires a storage means which has a large memory capacity for retaining temporally adjacent data. However, prediction with high-level precision makes a difference between a predicted image which is obtained by the prediction and an image to be coded small, whereby coding efficiency can be improved. Thus the coding method is determined according to the performance of an apparatus, the picture quality, the properties of coded data to be required and so on.
On the other hand, a method for coding image signals for each object has often been used in recent years. ISO standardizes these method as MPEG4. In November 1996, what is called the video verification model VM5.0 was worked out. The image signal for each object consists of pixel value signals which indicate brightness and color and are called texture and shape signals which represent the shape of the object. The image signal having this form is being utilized most in the computer graphics technology, and in the field where image sources are created such as the department of producing programs.
FIG. 15(a) to FIG. 15(c) are diagrams for explaining the coding for each object in the prior art. FIG. 16(a) and FIG. 16(b) are diagrams for explaining a signal processing for the coding for each object. FIG. 15(a) shows an example of objects to be coded, which is an image consisting of a background image and a foreground image (a goldfish swimming in a fish tank). FIG. 15(b) shows the foreground (the goldfish). FIG. 15(c) shows the background (water plants and water in the fish tank.
To composite the foreground image and the background image, information which is used for deciding which pixel constituting the composite image represents the foreground or the background, is required. For this reason, the foreground image shown in FIG. 15(b) consists of the pixel value signal shown in FIG. 16(a) and the shape signal (a binary alpha signal) shown in FIG. 16(b), the shape signal specifying the image representation In this case, the pixel value signal indicates the texture of the goldfish and includes the brightness signal and color signal of cach pixel. The shape signal indicates the profile of the goldfish, i.e. the contour of the goldfish, and is a two-valued signal having a value `1` inside the contour or a value `0` outside the contour. This shape signal indicates the foreground in the composition of the image. The shape signal shows that, in the figure, the region indicated by the black part has the value `1` and represents the foreground. In general, when the coding is carried out for each object, the pixel values signal and the shape signal are applied to specified objects while only the pixel value signal is applied to the parts other than the specified objects, whereby the coding efficiency is improved. As described above, in this case, the goldfish, i.e. the foreground image, is processed as a specified object.
The efficiency of coding the pixel value signal shown in FIG. 16(a) is improved because the pixel value signal shown in FIG. 16(a) is coded based on the above-mentioned temporal correlation referring to the signal which is obtained by decoding a pixel value signal which has been coded. There is another coding method which makes the coding efficiency more higher by adaptively changing two images for reference than by referring the pixel value signal of one image. The standards such as ISO MPEG1/2 and ITU-T H.261 provide the coding method which refers two images.
FIG. 17(a) to FIG. 17(c) and FIG. 18(a) to FIG. 18(c) are diagrams for explaining the coding of pixel value signals which refers a plurality of pictures. FIG. 17(a) to FIG. 17(c) show the pixel value signals of the input image which constitute the foreground image. FIG. 17(a) is taken at time t0. FIG. 17(b) is taken at time t1. FIG. 17(c) is taken at time t2. As shown in the figures, the three input pixel value signals are arranged in the same time series similarly to FIG. 14. A signal at time t0 is located at a forward position on time series from a signal at time t1. A signal at time t2 is located at a backward position on time series from a signal at time t1. The pixel value signal of the input image at time t1 shown in FIG. 17(b) has correlation with the pixel value signal at time t0 shown in FIG. 17(a) and the pixel value signal at time t2 shown in FIG. 17(c).
FIG. 18(a) and FIG. 18(c) shows decoded pixel value signals which are obtained by decoding the pixel value signals shown in FIG. 17(a) and FIG. 17(c) which have been coded. The predicted image at time t1 shown in FIG. 18(b) is generated with good precision from the pixel value signals of the decoded images at time 0 and time 2 based on the correlation shown in FIG. 17(a) to FIG. 17(c).
The typical method of predicting images can generate a predicted image at time t1 by motion-compensating already decoded images at time t1 and time t2 and averaging them. As there is a strong correlation between the predicted image at time t1 and the input image at time t1, the input image at time t1 is coded referring to the predicted image at time t1. That is, a difference image between the predicted image generated based on the forward and backward images on time series and the input image is calculated and then the pixel value signal of the difference image is coded.
Thus, when the image to be coded has strong correlation with the images located at a forward and backward positions on time series, it can be expected that the prediction has better precision by utilizing both the forward and backward images than by utilizing either of them. If the prediction has good precision, the pixel value signal of the difference image has a small amount of data, whereby the coding with high efficiency can be realized.
As described above, in the case of coding images for each object, the efficiency of coding the pixel value signal is realized based on the temporal correlation. On the other hand, the shape signal accompanying the pixel value signal is processed similarly to the pixel value signal, when only the intra-frame coding is carried out, or when the inter-frame coding accompanied with only the forward prediction is carried out. However, when the inter-frame coding accompanied with the bidirectional prediction is carried out, a problem arises whereby the efficiency of the coding of the shape signal decreases if the shape signal is processed in a similar way to that for the pixel value signal.
As the pixel value signal is a multivalued signal and includes a brightness signal and a color signal, the possibility of obtaining the preferable predicted image is strong because of the calculation of obtaining the average as described above. Therefore, the coding efficiency is improved if the temporally adjacent data are retained and are subjected to the calculation such as obtaining a difference or obtaining an average. As opposed to this, in the case of the two-valued shape signal as described above, for example, there is little merit even if an average is calculated referring to plural piece of reference information in order to obtain the preferable predicted image, because either of the two values must be used when the obtained average is neither of the two values. In general, for the two-valued shape signal, because the temporally adjacent data are retained and are subjected to the process such as obtaining an average, the precision of the prediction is not necessarily improved, but the utilization of the resource of the apparatus is prevented, or the coding efficiency is decreased.
In a prior art image coding, when the shape signal as well as the pixel value signal are similarly processed, a problem arises whereby the coding process with the bidirectional prediction decreases the process efficiency as described above. Thus, techniques for improving the efficiency of coding the pixel value signal are not simply applied to the coding of the shape signal. For that reason, in some cases the shape signal is processed by means of a method such as a reversible compression coding for two-valued signal which is used in a facsimile and the like, that is, the shape signal is recorded and transmitted apart from the pixel value signal in the prior art. However, a reversible method has generally less efficiency than a irreversible method, so the coding efficiency or the process efficiency would not be much improved.