As a system for efficiently encoding a picture signal, e.g., in the standardized scheme by the MPEG (Moving Picture Experts Group), an efficient coding system for a picture signal for use in so called digital storage media is standardized.
Here, storage media subject to this system are media having a continuous transfer rate (speed) of about 1.5M bps. (bits per second) such as a so called CD (Compact Disk), a DAT (Digital Audio Tape Recorder), or a hard, disk, etc. These media are not only directly connected to a decoder, but also are assumed to be connected thereto through transmission media such as a bus of a computer, a LAN (Local Area Network), or a telecommunication link, etc. Further, not only reproduction in forward direction, but also special functions such as random access, high speed reproduction, or reproduction in backward direction, etc. are taken into consideration.
The principle of the efficient coding system for a picture signal by the MPEG is as referred to below.
Namely, in this efficient coding system, a difference between pictures (frames) is first taken to thereby reduce the redundancy in the time base direction thereafter to reduce the redundancy in the spatial base direction by using a so called Discrete Cosine Transform (DCT) processing and variable length coding.
The redundancy in the time base direction will be first described below.
Generally, in successive moving pictures, a certain picture remarked (i.e., a picture at a certain time) and pictures before and after that picture are considerably similar to each other.
For this reason, as shown in FIG. 22, for example, if an approach is employed to take a difference between a picture to be coded from now on and a picture forward in time to transmit that difference, the redundancy in the time base direction can be reduced to lessen a quantity of information transmitted.
A picture coded in this way is called a Predictive-Coded Picture (P picture or P frame) which will be described later.
Similarly, if an approach is employed to take a difference between the picture to be coded from now on and a picture prepared forward or backward in time, and a difference between the picture to be coded from now on and an interpolated picture prepared by pictures forward and backward in time to transmit a difference of a smaller value of those differences, the redundancy in the time base direction can be reduced to lessen a quantity of information transmitted.
A picture coded in this way is called a Bidirectionally Predictive-coded picture (B picture or B frame) which will be described later.
In FIG. 22, a picture represented by I indicates an Intra-coded picture (I picture or I frame) which will be described later, a picture indicated by P indicates the above-mentioned P picture, and a picture indicated by B indicates the above-mentioned B picture.
In order to prepare respective predictive pictures, so called motion compensation is carried out.
Namely, in accordance with this motion compensation, e.g., a block of 16.times.16 pixels (hereinafter called a macro block) comprised of, e.g., unit blocks of 8.times.8 pixels is prepared to search a macro block in which a difference is minimum in the vicinity of the position of the macro block of a former picture to take a difference between the macro block and the searched macro block, thereby making it possible to reduce a quantity of data to be transmitted.
Actually, for example, in the P picture (Predictive-coded picture), picture data having a lesser quantity of data of a picture in which a difference between a picture to be coded from now on and a motion-compensated predictive picture is taken and a picture in which a difference between the picture to be coded from now on and the motion-compensated predictive picture is not taken is selected every macro block of 16.times.16 pixels, and the selected picture data is then coded.
However, in cases as described above, many data must be transmitted with respect to, e.g., the portion (picture) appearing from behind the portion where an object moves.
In view of this, for example, in the B picture (Bidirectionally Predictive-coded picture), there is coded picture data in which the quantity of data is the smallest of four picture data i.e., a difference between an already decoded motion compensated picture forward in time and the picture to be coded from now on; a difference between an already decoded motion compensated picture backward in time and the picture to be coded from now on; a difference between an interpolated picture prepared by the both pictures forward and backward in time and the picture to be coded from now on; and a picture in which a difference is not taken, i.e., the picture to be coded from now on.
The redundancy in the spatial base direction will now be described.
A difference of picture data is caused to undergo Discrete Cosine Transform (DCT) processing every unit block of 8.times.8 pixels without adopting an approach to transmit such a difference as it is.
In this DCT processing, a picture is not represented by the pixel level, but is represented by to what degree respective frequency components of a cosine function are included. For example, by the two-dimensional DCT processing, data of unit blocks of 8.times.8 pixels are transformed to data of coefficient blocks of 8.times.8 pixels of components of a cosine function.
There are many instances where a picture signal of a natural picture as imaged by a television camera becomes a smooth signal. In this instance, the above-mentioned DCT processing is implemented to the picture signal, thereby making it possible to reduce a quantity of data.
Namely, for example, in the case of a smooth signal like a picture signal of a natural picture as described above, the DCT processing is implemented thereto, whereby large values concentrate on values in the vicinity of a certain coefficient.
When this coefficient is quantized, most of coefficient values of the DCT coefficient block of 8.times.8 become equal to zero, so only large coefficient values are left.
In transmitting data of coefficient block of 8.times.8, an approach is employed to transmit such data, in order of so called a zigzag scan, by using so called a Huffman codes in which a non-zero coefficient and so called zero-run indicating how many number of zeros continue before that coefficient are combined as each set, thereby making it possible to reduce a quantity of data transmitted.
On the decoder side, a picture is reconstructed by a procedure opposite to the above.
The structure of data handled by the above-described coding system is shown in FIG. 23.
Namely, data structure shown in FIG. 23 is comprised of a block layer, a macro block layer, a slice layer, a picture layer, a Group of Picture (GOP) layer, and a video sequence layer in order from the bottom. Explanation will now be described in order from a lower layer in FIG. 23.
In the block layer, each block of this block layer is comprised of luminance or color difference adjacent 8.times.8 pixels (pixels of 8 lines.times.8 pixels).
The above-described DCT (Discrete Cosine Transform) processing is applied every unit block.
In the above-mentioned macro block layer, each macro block of the macro block layer is comprised of six blocks in total of four luminance blocks (unit blocks of luminance) Y0, Y1, Y2, Y3 adjacent in left and right directions and in upper and lower directions, and color difference blocks (unit blocks of color difference) Cr, Cb existing correspondingly at the same positions as those of the luminance blocks on a picture.
These blocks are transmitted in order of Y0, Y1, Y2, Y3, Cr and Cb.
In this coding system, what picture is used for a predictive picture (a reference picture for taking a difference), or whether or not there is a need to transmit a difference is judged every macro block.
The above-mentioned slice layer is comprised of one or plural macro blocks successive in order of scanning of a picture.
At the header of this slice layer, differences between respective motion vectors in a picture and a DC (Direct Current) component are reset. The first micro block has data indicating position in a picture. Accordingly, even in the case where an error takes place, data can be restored to a normal state every slice.
For this reason, the length and starting position of the slice are arbitrary, and can be changed in dependency upon an error state of a transmission path.
In the above-mentioned picture layer, a picture, i.e., each frame is comprised of at least one or plural slices mentioned above. These pictures are respectively classified into four kinds of pictures of the Intra-coded picture (I picture or I frame), the Predictive-coded picture (P picture or P frame), the Bidirectional predictive-coded picture (B picture or B frame), and DC intra coded picture.
In the intra coded picture (I picture), when a picture is coded, information only within that picture is used.
In other words, at the time of decoding, a picture can be reconstructed only by information of the I picture itself. In actual terms, a picture is caused to undergo DCT processing as it is without taking a difference, and the picture thus processed is then coded.
Although this coding system has a poor efficiency in general, if such I pictures are inserted at suitable portions, random access or high speed reproduction can be made.
In the above-mentioned Predictive coded picture (P picture), an I picture or a P picture, which is positioned forward in time in terms of input sequence and has been already decoded, is used as a predictive picture (a picture serving as a reference in taking a difference).
Actually, any one of a higher efficiency of the method of coding a difference between a picture to be coded from now on and a motion compensated predictive picture and the method of coding a picture as it is without taking a difference therebetween (intra coding) is selected every macro block.
In the above-mentioned Bidirectional predictive coded picture (B picture), three kinds of pictures of an I picture and a P picture which are positioned forward in time and have been already decoded, and an interpolated pictures prepared from the both pictures are used as a predictive picture.
Thus, a picture having the highest efficiency of the three kinds of motion compensated coded pictures of difference and the intra-coded picture can be selected every macro block.
The DC intra-coded picture is an intra coded picture comprised of only DC coefficients of DCT, and therefore cannot exist in the same sequence as that of other three kinds of pictures.
The above-mentioned Group Of Picture (GOP) layer is comprised of only one or plural I pictures, or one or plural I pictures and plural non-I pictures. When the order of inputting to an encoder is assumed to be, e.g., 1T, 2B, 3B, 4P*5B, 6B, 7I, 8B, 9B, 10I, 11B, 12B, 13P, 14B, 15B, 16P*17B, 18B, 19I, 20B, 21B, and 22P, an output of the encoder, i.e., an input of a decoder is, e.g., 1I, 4P, 2B, 3B*7I, 5B, 6B, 10I, 8B, 9B, 13P, 11B, 12B, 16P, 14B, 15B*19I, 17B, 18B, 22P, 20B, and 21B.
The reason why exchange of the order is carried out in the encoder in this way is that in the case where, e.g. , the B picture is coded or decoded, the I picture or the P picture backward in time serving as a predictive picture of the B picture must be coded in advance.
In this case, the interval (e.g., 9) of the I picture, and the interval (e.g., 3) of the P picture or the B picture are arbitrary.
Further, the interval of the I picture or P picture may be changed in the Group Of Picture layer.
It is to be noted that the connecting portion of the Group Of Picture layer is indicated by asterisk (*).
In FIG. 23 reference symbols I, P and B indicate I picture, P picture and B picture, respectively.
The above-mentioned video sequence layer is comprised of one or plural Group Of Picture layers in which the picture size and the picture rate, etc. are the same.
As described above, in the case of transmitting a moving picture standardized in accordance with the efficient, coding system by the MPEG, picture data obtained by compressing one frame (picture) in a picture is first, sent, and data of a difference between that picture and a picture obtained by implementing motion-compensation thereto is then transmitted.
Meanwhile, in the case of processing, e.g., a field as a picture in the above-mentioned one frame (picture), vertical positions vary interchangeably at respective 2 fields. For this reason, also at the time of transmitting, e.g., a still picture, difference information will have to be transmitted.
Further, for example, in the case of processing, with a picture being as a unit, a frame obtained by processing fields as a picture, a picture moved in so called a comb shape will have been processed with respect to, e.g., a moving portion in the frame.
Namely, in the case where a moving body CA such as an automotive vehicle, etc. exists on this side of a still background as shown in FIG. 24, for example, since there is motion (movement) between fields when attention is drawn to one frame, such a portion would be a picture in a comb form KS.
Further, in the case of processing a picture in which, e.g., still or stationary portions and moving picture portions are mixed, even if there is employed any one of the method of processing field as a picture and the method of processing frame as a picture, a picture portion having a poor compression efficiency would be in the picture.
This invention has been proposed in view of actual circumstances as described above, and its object is to provide an efficient coding apparatus for a picture signal, which is capable of efficiently carrying out the field processing or the frame processing even if a picture to be coded is a picture having small movement or great movement, or a picture in which those pictures are mixed with respect to a picture of the field structure.