1. Field of the Invention
This invention relates to a method and apparatus for encoding a picture advantageously employed for encoding a picture by way of data compression.
2. Description of Related Art
FIG. 1 shows a conventional arrangement of a device conveniently employed for encoding a moving picture by way of data compression. With the picture encoding device shown in FIG. 1, digitized picture data of luminance components (Y) chroma components (Cb) and chroma components (Cr), with the numbers of pixels equal to 352(H).times.240(V).times.30 frames, 174(H).times.120(V).times.30 frames and 174(H).times.120(V).times.30 frames, respectively, are fed to an input terminal 1.
The input picture data, entering the input terminal 1, is sent to a motion detector 20 and a block division unit 22 via a frame memory 10 configured for transiently storing the input picture data and re-arraying the picture data according to a pre-set sequence.
The block division unit 11 divides luminance components (Y) and chroma components (Cr), (Cb) of each frame supplied from the frame memory 10 into 8.times.8 pixel blocks, as shown in FIG. 3. The four blocks of the luminance components (Y0, Y1, Y2 and Y3), one chroma block (Cb) and one chroma block (Cr), totalling six blocks (Y0, Y1, Y2, Y3, Cb and Cr), are termed a macro-block.
The macro-block-based data from the block division unit 11 are sent to s subtractive unit 12.
The subtractive unit 12 finds a difference between data from the block division unit 11 and inter-frame predictively coded picture data as later explained and sends the resulting difference to a fixed terminal b of a changeover switch 13 as data for inter-frame predictive coding, as will be explained subsequently. The data from the block division unit 11 is supplied to the other fixed terminal a of the changeover switch 13 as data of a frame for intra-frame coding, as also will be explained subsequently.
The block-based data from the changeover switch 13 is transformed by DCT by a DCT circuit 14 from which the resulting DCT coefficients are sent to a quantizer 15. The quantizer 15 quantizes the DCT output with a pre-set quantization step width and the resulting quantized coefficients are sent to a zig-zag scan circuit 16.
The zig-zag scan circuit 16 re-arrays the quantized coefficients according to zig-zag scan as shown in FIG. 4 and sends the resulting output to a variable length encoding circuit 17. The variable length encoding circuit 17 variable length encodes output data of the zig-zag scan circuit 16 and sends the resulting output to an output buffer 18 while sending the information specifying the quantity of codes generated by variable length encoding to a quantization step controller 19. The quantization step controller 19 controls the quantization step width of the quantizer 15 based upon the information specifying the quantity of codes from the variable length encoding circuit 17. Output data of the output buffer 18 is outputted as a compressed coded output at an output terminal 2.
An output of the quantizer 15 is de-quantized by a de-quantizer 27 and inverse-transformed by an inverse DCT circuit 26. An output of the inverse DCT circuit 26 is sent to an addition unit 25.
The addition unit 25 is also fed with inter-frame predictively coded picture data from a motion compensation unit 21 via a changeover switch 24 which is turned on for a frame produced by inter-frame predictive coding. Thus the inter-frame predicted picture data is summed to the output data of the inverse DCT circuit 26. Output data of the addition unit 25 is temporarily stored in a frame memory 22 and thence supplied to a motion compensation unit 21.
The motion compensation unit 21 effects motion compensation based upon the motion vector detected by the motion detection unit 20 and outputs the resulting inter-frame predictively-coded picture data.
An illustrative sequence of operations of the conventional picture encoder shown in FIG. 1 is explained in detail. For convenience in explanation, the following appellation is used for the respective frames.
The frames arrayed in the display sequence are termed I0, B1, B2, P3, B4, B5, P6, B7, B8, I9, B10, Bit, B12, . . . . Of these frames, I, P and B refer to the sorts of the methods for data compression, as later explained, and the numerals next to I, P and B simply indicate the display sequence.
For compressing these pictures, the MPEG 1 of the MPEG (Moving Picture Expert Group) which is a work group for international standardization of the color moving picture encoding system, provides the following.
First, the picture I0 is compressed by DCT, quantization and VLC.
Next, the picture P3 is compressed. At this time, it is not the picture P3 itself but difference data between P3 and I0 that is compressed.
Next, the picture B1 is compressed. At this time, it is not the picture B1 itself, but difference data between the pictures B1 and I0, difference data between pictures B1 and I0, between pictures B1 and P3 or between the picture B1 and mean values of the pictures I0 and P3, whichever is smaller in the information volume, that is compressed.
Next, the picture B2 is compressed. At this time, it is not the picture B2 itself, but difference data between the pictures B2 and I0, difference data between pictures B2 and P3 or a difference between the picture B2 and the mean values of the pictures I0 and P3, whichever is smaller in the information volume, that is compressed.
Next, the picture P6 is compressed. At this time, it is not the picture P6 itself, but difference data between the pictures P6 and P3, that is compressed.
The following describes the above-described processing in the sequence in which it is executed.
______________________________________ Pictures to be Counterpart for Processed Taking Difference ______________________________________ (1) IO -- (2) P3 IO (3) B1 IO or P3 (4) B2 IO or P3 (5) P6 P3 (6) B4 P3 or P6 (7) B5 P3 or P6 (8) P9 P6 (9) B7 P6 or P9 (10) B8 P6 or P9 (11) I9 -- (12) P12 IO (13) B10 I9 or P12 (14) B11 I9 or P12 ______________________________________
In this manner, the encoding sequence is I0, P3, B1, B2, P6, B4, B5, P9, B7, B8, I9, P12, B10, B11, . . . and thus changed from the display sequence. The compressed data, that is encoded data, are arrayed in this encoded sequence.
The above is explained in further detail along with the operation of the picture encoding device shown in FIG. 1.
In encoding the first picture I0, data of a picture to be compressed first are outputted by the frame memory 10 and blocked by the block dividing unit 11. The block dividing unit 11 outputs block-based data in the sequence of Y0, Y1, Y2, Y3, Cb and Cr. The block-based output data is routed via the changeover switch 13 set to the side of the fixed terminal a to the DCT circuit 14. The DCT circuit 14 orthogonally transforms the block-based data with two-dimensional discrete cosine transform. This converts data from the time axis into that on the frequency axis.
The DCT coefficients from the DCT circuit 14 are sent to the quantizer 15 where it is quantized at a pre-set quantization step width. The DCT coefficients are then re-arrayed in a zig-zag sequence by the zig-zag scan circuit 16 as shown in FIG. 4. If the DCT coefficients are arrayed in a zig-zag sequence, the coefficient values are those of higher frequency components towards the back so that the coefficient values become smaller towards the back. Thus, if the coefficient data are quantized at a certain value S, the probability of the result of quantization becoming zero becomes higher towards the back so that higher frequency components are removed.
The quantized coefficients are then sent to the variable length coding (VLC) circuit 17 so as to be processed with Huffman coding. The resulting compressed bitstream is temporarily stored in the output buffer 18 and thence outputted at a pre-set bit rate. The output buffer 18 is a buffer memory for outputting an irregularly generated bitstream at a pre-set bit rate.
The above-described compression of a sole picture is termed intra-frame coding and the resulting picture is termed an I-picture.
A decoder receiving the bitstream of the I-picture performs an operation which is the reverse of the above-described operation for completing the first picture.
The second picture, that is the picture P3, is encoded in the following manner.
The second picture and P3 may be compressed as I-pictures to generate a bitstream. However, for improving the compression ratio, the second picture P3 are compressed in the following manner to take advantage of the correlation between the contents of the continuous pictures.
First, the motion detection unit 20 finds, in the first picture I0, a pattern similar to each macro-block constituting the second picture, and represents the pattern in terms of coordinates of relative positions (x, y) termed a motion vector.
If the correlation between the pattern of the first picture represented by the motion vector and the pattern of the block now to be encoded is extremely strong, the difference data becomes extremely small, so that the amount of compressed data becomes smaller when the motion vector and the difference data are encoded than when the block is compressed by the intra-frame coding.
The above-described compression method is termed the inter-frame predictive coding. The difference data is not necessarily smaller and, depending upon the picture pattern, that is the contents of the picture, the compression ratio becomes higher with intra-frame coding than with coding the difference data. In such case, data is compressed by the intra-frame coding. Which of the inter-frame predictive coding or the intra-frame coding is to be employed differs from one macro-block to another.
If the above is to be explained in connection with the picture encoder shown in FIG. 1, the picture which is the same as the picture produced on the decoder side needs to be produced on the encoder side at all times if the inter-frame predictive coding is to be achieved.
To this end, there is provided in the encoder a circuit which is the same as the decoder. This circuit, termed a local decoder, includes the inverse DCT circuit 27, inverse DCT circuit 26, addition unit 25, frame memory 22 and the motion compensation unit 21 shown in FIG. 1. The picture stored in the frame memory 22 is termed a locally decoded picture or locally decoded data. The data of a picture not yet compressed is termed an original picture or original data.
During compression of the first picture, that is the I-picture I0, the first picture decoded by the local decoder is stored in the frame memory 22. Noteworthy is the fact that the picture produced by the local decoder is not the pre-compression data but is the compressed and decoded picture and hence is the same picture as the picture which is to be decoded by the decoder and thus suffers from deterioration in the picture quality ascribable to compression.
It is to the encoder under such condition that data of the second picture P3 (original data) is entered. The motion vector must have been pre-detected by this time. Such data has the motion vector from block to block. Such motion vector is supplied to the motion compensation unit 21. The motion compensation unit 21 outputs data on the locally decoded partial picture specified by the motion vector (motion compensated data or MC data: one macro-block) as the inter-frame predicted picture data.
Pixel-based difference data from the subtractive unit 12 between the second original data and the motion compensated data (inter-frame predicted picture data) enters the DCT circuit 14. The method for compression since this time is basically the same as that for the I-picture. The picture compressed by the above-described compressing method is termed the forward predictive coded picture or P-picture.
More specifically, all macro-blocks of the P-picture are not necessarily compressed by the inter-frame predictive coding. If intra-frame coding is judged to give a higher coding efficiency for a given macro-block, the macro-block is encoded by intra-frame coding.
That is, with the P-picture, the intra-frame coding or the inter-frame predictive coding is selected for compression from one macro-block to another. The macro-block coded by intra-frame coding and that coded by inter-frame predictive coding are termed an intra-macro-block and an inter-macro-block, respectively.
With the above local decoder, as described above, an output of the quantizer 15 is de-quantized by the de-quantizer 27 and inverse orthogonal transformed by the inverse DCT circuit 26. During encoding, the motion-compensated data (MC data) is added to the output of the inverse DCT circuit 26 to give an ultimate locally decoded picture.
The third picture, that is the picture B1, is encoded in the following manner.
For encoding the third picture (B1 picture), the motion vector for each of the pictures I0 and P3 is searched. The motion vector for the picture I0 is termed the forward vector MVf(x,y), while the motion vector for the picture P3 is termed the backward vector MVb(x,y).
For this third picture, difference data is similarly compressed. It matters which data should be compressed. It is sufficient if the difference is taken with respect to a picture which will give the smallest amount of the information by difference taking. There are four alternatives for the method for compression, namely (1) employing the difference from data on the picture I0 indicated by the forward vector MVf(x,y); (2) employing the difference from data on the picture P3 indicated by the backward vector MVb(x,y); (3) employing the difference between mean values of data on the picture I0 indicated by the forward vector MVf(x,y) and data on the picture P3 indicated by the backward vector MVb(x,y); and (4) not employing the difference, that is encoding the third picture by intra-frame coding.
One of these four methods for compression is selected on the macro-block basis. With the alternatives (1) to (3), the respective motion vectors are sent to the motion compensation unit 21. The subtractive unit 12 takes the difference between the macro-block data and the motion-compensated data. The resulting difference is sent to the DCT circuit 14. With the alternative (4), the data is directly sent to the DCT circuit 14.
The above-described operation becomes feasible since two pictures, namely the pictures I0 and P3, have been restored by the encoding of the first and second pictures in the frame memory 22 configured for storing the locally decoded picture.
The fourth picture, that is the picture B2, is encoded in the following manner.
The encoding of the fourth picture B2 is carried out in the same manner as for the encoding of the third picture B1 except that "B1" in the previous description of the encoding method for the third picture B1 now reads "B2".
The fifth picture, that is the picture P6, is encoded in the following manner.
The encoding of the fifth picture P6 is carried out in the same manner as for the encoding of the second picture P3 except that "P3" and "I0" in the previous description of the encoding method for the second picture P3 now read "P6" and "P2", respectively.
The encoding of the sixth picture ff. is not made since it is the repetition of the above-described encoding operations.
The MPEG provides a group-of-picture (GOP).
That is, a group of plural pictures is termed a group-of-pictures (GOP). The GOP must consist of continuous pictures in terms of the encoded data, that is the compressed data. On the other hand, the GOP takes account of random accessing and, to this end, the first picture in the GOP needs to be the I-picture. In addition, the last picture in the GOP in the display sequence must be an I-picture or a P-picture.
FIG. 5 shows an example in which the first GOP consists of four pictures and the second GOP ff. consist of six pictures. FIGS. 5A and 5B illustrate the display sequence and the sequence of the encoded data, respectively.
If, in FIG. 5, attention is directed to the GOP2, since the pictures B4 and B5 are formed from the pictures P3 and I6, the pictures B4 and B5 cannot be decoded correctly if the picture I6 is accessed by random access, since there is not the picture P3. The GOP which cannot be correctly decoded within itself is said to be not closed.
Conversely, if the pictures B4 and B5 only refer to the picture I6, the pictures B4 and B5 can be decoded correctly if the pictures B4 and B5 access the picture I6 by random access, since the picture P3 is not required. The GOP which can be fully decoded from the information within itself is said to be closed.
Data compression is performed by the most efficient one of the above-described compressing methods. The quantity of the encoded data thus generated also depends on the input picture and can be known only after actual data compression.
However, it is also necessary to manage control for providing a constant bit rate of the compressed data. The parameter for managing such control includes the quantization step or quantization scale (Q-scale) as the information specifying the quantity of the codes applied to the quantizer 15. The larger or smaller the quantization step, the smaller and larger is the quantity of generated bits for the same compression method.
The value of the quantization step is controlled in the following manner.
For assuring a constant bit rate of the compressed data, the output buffer 18 is provided at an output of the encoder. The output buffer operates for absorbing the difference in the difference in the amount of the generated data within a certain extent from picture to picture.
However, if data is generated at a rate surpassing a pre-set rate, the residual data quantity in the output buffer 18 is increased thus producing data overflow. Conversely, if data is generated at a rate lower than the bit rate, the residual data quantity in the output buffer is decreased thus ultimately producing data underflow.
Thus the encoder is configured for feeding back the residual data quantity in the output buffer 18 so that the quantization step controller 19 controls the quantization step of the quantizer 15, in such a manner that, if the residual data quantity in the output buffer 18 becomes smaller, the quantization step is controlled to be smaller to refrain from excessive compression and, if the residual data quantity of the output buffer 18 becomes larger, the quantization step is controlled to be larger to raise the compression ratio.
On the other hand, there exists a considerable difference in the range of the quantity of the encoded data generated by the above-described compressing method (the above-described intra-frame coding and inter-frame coding).
If data compression is performed by the intra-frame coding, a large quantity of data is produced, so that, if the vacant capacity of the output buffer 18 is small, the quantization step size needs to be increased. If the quantization step size is maximized, data overflow in the output buffer 18 may occasionally be produced. If the generated data can-be stored in the buffer 18, the intra-frame encoded picture produced with a larger quantization step affects the picture quality of subsequently produced inter-frame predictively-coded pictures. In this consideration, a sufficient vacant capacity of the output buffer 18 is required before proceeding to data compression by intra-frame coding.
Thus the compression method of a pre-set sequence is provided, while the quantization step controller 19 effects feedback control of the quantization step size for assuring a sufficient vacant capacity of the output buffer 18 prior to the intra-frame coding.
The above renders it possible to suppress the bit rate of the encoded data to a pre-set value.
The above-described conventional method has a drawback that high picture quality cannot be achieved for the following reason.
That is, for compressing input pictures, the information quantity of which is changed each time instant, at a pre-set bit rate to a high mean picture quality, it is necessary to allow a larger quantity of the compressed data and a smaller quantity of the compressed data for a picture having a larger quantity of the information and for a picture having a smaller quantity of the information, respectively, in order to assure uniform picture quality insofar as a low bit rate may be maintained by an output buffer. However, this cannot be achieved with the conventional method in the following cases.
Assuming that there occur pictures of smaller quantity of the information in succession, followed by a picture having a larger quantity of the information, the quantization step should not be decreased excessively, while the residual data quantity in the output buffer should be kept small until the encoding of the next oncoming picture with the larger quantity of the information. However, with the above-described system of feeding back the residual data quantity of the output buffer, the residual data quantity in the output buffer is increased while the pictures with the smaller quantity of the information occur in succession.
Conversely, if the picture with a smaller quantity of the information occurs next to the picture with a larger quantity of the information, overflow is not likely to be produced without the necessity of encoding the previously supplied pictures with the larger quantity of the information with a larger quantization step for decreasing the data quantity in the output buffer, because the quantity of the information of the next following picture is small. However, with the above-described system of feeding back the data quantity of the output buffer, since the quantity of the information of the succeeding picture is not known, the data quantity in the output buffer is controlled to be decreased, that is the quantization step is controlled to be decreased, thus lowering the picture quality.
In this consideration, it may be envisaged to evaluate the quantity of the information of the input picture and to control the quantization step based upon this evaluation.
However, if, with the picture encoder having means for evaluating the quantity of the information of the input picture, an input picture is compressed, the number of bits to be allocated to data produced after compressing the input information is set depending upon the quantity (difficulty) of the input picture. In such case, the quantization step of the quantizer needs to be predicted highly accurately in dependence upon the quantity of bit allocation.
If the predicted quantization step is not proper, the quantity of bit allocation possible for post-compression data is not reached or surpassed, thus affecting the bit allocation for compressing the remaining pictures.
Thus, with a frame with a reduced quantity of bit allocation, for example, the quantization step becomes rough thus lowering the picture quality. Consequently, the frames with uniform picture quality do not last long thus giving the impression of poor picture quality to the viewer. If the prediction is upset to a larger extent, buffer overflow or underflow will be incurred in the worst case.
If the quantization step is controlled depending upon the ratio in the picture of the progress of compression, predicted amount of bit allocation and the quantity of the post-compression information, the quantization step in the picture undergoes significant variation within the picture if the basic prediction of the quantization step is upset. Since the compression is performed in the raster scan order, any significant variation in the quantization step in the picture renders striped nonuniform portions of the picture quality apparent in the picture thus lowering the picture quality.