According to a draft, for standardization for the high efficiency encoding system for picture signals, as proposed by the Moving Picture Experts Group (MPEG), a high efficiency encoding system for picture signals for a so-called digital storage medium is prescribed. The following is the principle of the high efficiency encoding system by MPEG.
That is, with this high efficiency encoding system, a difference is first taken between pictures to lower the redundancy along the time scale. Subsequently, discrete cosine transform (DCT) and variable length coding (VLC) are carried out to lower the redundancy along the spatial axis.
The redundancy along the time scale is first explained.
In general, in consecutive moving pictures, a picture under consideration, that is a picture at a given time point, bears strong similarity to temporally previous and temporally succeeding pictures. Consequently, by taking a difference between a picture now to be encoded and a temporally forward picture, and by transmitting the difference, as shown in FIG. 1, it becomes possible to diminish the redundancy along the time scale and hence the amount of the information to be transmitted. The picture encoded in this manner is termed a predictive-coded picture, P-picture or a P-frame, as later explained.
Similarly, by taking a difference between a picture now to be encoded and a temporally forward picture, a temporally backward picture or a interpolated picture produced from the temporally forward and temporally backward pictures, and transmitting the smallest of the differences, it becomes possible to diminish the redundancy along the time scale and hence the amount of the information to be transmitted. The picture encoded in this manner is termed a bidirectionally predictive-coded picture, B-picture or a B-frame, as later explained. In FIG. 1, a picture indicated by I is an intra-coded picture as later explained, while pictures indicated by P and B in the figure are the above-mentioned P-picture and the B-picture, respectively.
For producing prediction pictures, so-called motion compensation is performed. According to the motion compensation, a 16.times.16 pixel block, referred to herein as a macro-block, made up of unit blocks each consisting of 8.times.8 pixels, is prepared, one of those macro-blocks of the previous picture which is in the vicinity of a macro-block under consideration and has the smallest of the differences is retrieved and a difference between the macro-block under consideration and the macro-block thus retrieved is taken to diminish the volume of data to be transmitted. For example, with the above-mentioned predictive-coded picture or P-picture, one of picture data produced by taking a difference between the prediction picture and the motion-compensated prediction picture and picture data produced by not taking a difference between the prediction picture and the motion-compensated prediction picture, whichever is smaller in data volume, is selected and encoded on the basis of the 16.times.16 pixel macro-block as a unit.
However, in such case, a larger amount of data needs to be transmitted for a picture portion which has appeared from behind a moving object. In this consideration, with the above-mentioned bidirectionally coded picture or B-picture, one of picture data corresponding to the difference between the picture data now to be encoded and the decoded and motion-compensated temporally forward picture data, picture data corresponding to the difference between the picture data now to be encoded and the decoded and motion-compensated temporally backward picture data, picture data corresponding to the difference between the picture data now to be encoded and interpolated picture data prepared by adding the decoded and motion-compensated temporally backward and temporally forward picture data and the picture data for which the difference has not been taken, that is the picture now to be encoded, whichever has the smallest data volume, is encoded.
The redundancy along the spatial axis is hereinafter explained.
The difference of the picture data is not transmitted directly, but is processed with discrete cosine transform (DCT) from one 8.times.8 pixel unit block to another. The DCT represents a picture depending on which frequency components of a cosine function are contained in a picture and in which amounts these frequency components are contained, instead of on the pixel level. For example, by two-dimensional DCT, picture data of the 8.times.8 unit block is transformed into a 8.times.8 coefficient block of the components of the cosine function. For example, picture signals of a natural scene as imaged by a television camera frequently represent smooth signals. In such case, the picture data volume may be efficiently diminished by processing the picture signals with DCT.
The data structure handled by the above-mentioned encoding system is shown in FIG. 2. The data structure shown in FIG. 2 includes, from the lower end on, a block layer, a macro-block layer, a slice layer, a picture layer, a group-of-picture (COP) layer and a video sequence layer. The data structure is now explained, from the lower layer on, by referring to FIG. 2.
First, as to the block layer, each block of the block layer is composed of 8.times.8 neighboring pixels, that is 8 pixels of 8 lines, of luminance or color difference. The above-mentioned DCT is performed for each of these unit blocks.
The macro-blocks of the macro-block layer are made up of left and right upper and lower four neighboring luminance blocks or unit luminance blocks Y0, Y1, Y2 and Y3 and color difference blocks or unit color difference blocks Cr, Cb, which are at the same positions on the picture as the luminance blocks, as shown at E in FIG. 2. These blocks are transmitted in the sequence of Y0, Y1, Y2, Y3, Cr and Cb. Which picture is used as a prediction picture, that is a reference picture of difference taking, or whether or not a difference need not be transmitted, is decided on the macro-block basis.
The above-mentioned slice layer is made up of one or more macro-blocks arrayed in the picture scanning sequence, as shown at D in FIG. 2. At a header of the slice, the difference of the dc component and the motion vector in the picture are reset. On the other hand, the first macro-block has data indicating the position within the picture, such that reversion may be made in case of an error occurrence. Consequently, the length or the starting position of the slice are arbitrary and may be changed depending on the error state on the transmission channel.
As to the picture layer, each picture is made up of at least one or plural slice(s), as shown at C in FIG. 2. Each picture may be classified into the intra-coded picture (I-picture or I-frame), the predictive-coded picture (P-picture or P-frame), bidirectionally coded picture (B-picture or B-frame) and the DC intra-coded picture (DC coded (D) picture).
It is noted that, for encoding the intra-coded picture or I-picture, only the information which is closed within each picture is employed. In other words, for decoding the I-picture, only the information contained in the picture concerned is employed. In effect, or encoding a picture by intra-coding, the picture is directly discrete cosine transformed without taking a difference. Although this encoding system usually has a poor efficiency, random accessing or high-speed reproduction may be enabled by inserting this picture at arbitrary places.
As to the predictive-coded picture (P-picture), the I-picture or the P-picture positioned temporally previously at an input and already decoded is employed as a prediction picture (reference picture in difference taking). In effect, encoding the difference between the prediction picture and the motion-compensated prediction picture or directly encoding the prediction picture, that is without taking the difference, whichever is more efficient, is selected on the macro-block basis.
As to the bidirectionally coded picture or B-picture, three types of pictures, namely a temporally previously positioned and already decoded I-picture, a temporally previously positioned and already decoded P-picture and an interpolated picture obtained from these pictures, are employed as a prediction picture. In this manner, encoding of the difference between the prediction picture and the motion-compensated picture and the intra-coding, whichever is more efficient, may be selected on the macro-block basis.
The DC intra coded picture is the intra-coded picture which is made up only of DC coefficients in DCT and which cannot exist in the same sequence as the remaining three pictures.
The GOP layer is made up of one or plural I-picture(s) and zero or plural non-I-pictures, as shown at B in FIG. 2. The distance between the I-pictures, such as 9, or the distance between the I-pictures or the B-pictures, such as 3, is arbitrary. Besides, the distance between the I-pictures or between the B-pictures may be changed within the inside of the GOP layer.
The video sequence layer is made up of one or plural GOP layer(s) having the same picture size or the same picture rate, as shown at A in FIG. 2.
For transmitting the moving picture standardized in accordance with the high efficiency encoding system by MPEG, as described above, picture data produced by compressing a picture in itself are transmitted, and subsequently a difference between the picture and the motion-compensated same picture is transmitted.
However, the following problem has been found to be raised when the picture to be encoded is an interlaced picture resulting from an interlaced scanning.
That is, if a picture resulting from the interlaced scanning is encoded on the field-by-field basis, a difference in the vertical positions is alternately incurred from field to field. Consequently, when transmitting a stationary portion of a moving picture, difference data is produced at a boundary between the fields, notwithstanding that the picture portion remains stationary. Since the difference data needs be transmitted, the encoding efficiency is lowered at the stationary portion of the moving picture.
Also, if a picture produced by interlaced scanning is encoded on the field-by-field basis, since each block is formed on the field-by-field basis, the interval between pictures becomes broader than if the block is formed on the frame-by-frame basis, with the result that correlation and hence the coding efficiency are lowered.
On the other hand, if the picture resulting from interlaced scanning is encoded on the frame-by-frame basis, the moving portion in the frame is blurred in the shape of a comb. For example, if a moving object, such as a motor vehicle, is present ahead of a stationary object, the motor vehicle, which is the moving portion, becomes blurred when viewed as a frame, as indicated at KS in FIG. 4, because of motion between fields. The result is that high-frequency components, not present in the original picture, are transmitted, thus lowering the encoding efficiency.
Besides, with the frame-by-frame encoding, since encoding is made on the basis of two consecutive fields making up a frame as a unit, predictive coding cannot be employed between the two consecutive fields. Thus the minimum distance of predictive coding becomes a frame or two fields. Consequently, as compared to the field-by-field coding with the minimum predictive coding distance of one field, the frame-by-frame encoding is disadvantageous in respect to a picture having a fast or intricate motion.
As discussed in the foregoing, there are occasions wherein the field-by-field encoding is lowered in encoding efficiency with corresponding rise in encoding efficiency of the frame-by-frame encoding, or wherein the frame-by-frame encoding is lowered in encoding efficiency with corresponding rise in encoding efficiency of the field-by-field encoding.
In view of the above-described status of the art, it is an object of the present invention to provide a picture data encoding method and a picture data encoding device whereby a picture produced by interlaced scanning may be encoded efficiently whether the picture is replete with motion, the picture shows only little motion or the picture replete with motion and the picture showing only little motion co-exist, and a picture data decoding method and a picture data decoding device for decoding picture data encoded by the encoding method and the encoding device.