An international standard for compressing image signals and converting same into a digital code is known as ISO/IEC 13818-2, also known as MPEG2. A typical method of digital encoding image signals conforming to that format is shown in Test Model 3 of ISO-IEC/JTC/SC29/WG11 No. 328. FIG. 1 of the present drawings is a block diagram of a typical MPEG2 video encoding apparatus. This MPEG2 video encoding apparatus comprises a frame converter 101 for shuffling input video signal into encoding sequence, a block converter 102 for converting picture data into encoding units called macro blocks, a subtractor 103 for determining the difference between an input macro block and a predicted value with respect to its image data, a DCT (discrete cosine transform) 104, a quantizer 105, a variable length encoder 106, an inverse quantizer 107, an IDCT (inverse discrete cosine transform) 108, a motion compensation block 109, a mode discriminator 110, a motion detector 111, a quantizer control block 112, an encoder buffer 113, and an adder 114.
Prior to explaining the operation of the above-described MPEG2 video encoding apparatus, a data structure for image encoding is described with reference to FIG. 2.
Each picture of an image to be encoded is divided into macro blocks and encoded. The picture is an image of, for example, a frame or field unit, and unless otherwise noted in the following description, a frame is referred to as a picture. The macro block is a data area of 16.times.16 pixels, and the luminance and color difference signals are respectively encoded in blocks of 8.times.8 pixels each.
One data unit called a slice is composed of a plurality of macro blocks, and one picture is composed of a plurality of slices. The picture consists of an I picture encoded using information only from itself, a P picture predicted from a past picture in time, and a B picture predicted from both past and future pictures in time. The picture configuration in FIG. 2 is a typical example in which a P picture, three pictures ahead, is predicted by using a first I picture, and B pictures are arranged on both sides of the P picture. Therefore, when encoding, it is necessary to first encode the I picture, then the P picture, and then the B pictures, which requires rearrangement of images in the original time direction.
Furthermore, with a plurality of pictures starting from an I picture, a group of pictures (GOP) is composed, and one video sequence is composed of an arbitrary number of GOPs. Thus, again, the macro block is defined as an image segment, and the slice, picture, and GOP composed of a plurality of macro blocks may be defined as image segment groups. Supposing the GOP to be an image segment group, then for example, a picture that is a smaller image segment group may be defined as a sub-set of a GOP or image segment group.
With the understanding of the above-described data structure, the operation of the MPEG2 video encoding apparatus of FIG. 1 is described below.
An input signal is fed into the frame converter 101, and the sequence of pictures of input image is converted. The output of the frame converter 101 is supplied to the block converter 102, and the block converter 102 divides the entered image into macro blocks of 16.times.16 pixels each, and supplies those macro blocks to the subtractor 103. In the subtractor 103, the predicted value obtained from the motion compensation block 109 is subtracted from the signal supplied from the block converter 102, and a predicted error is determined. That predicted error is transformed in the DCT 104 into each block of 8.times.8, and each resulting transformed coefficient is quantized in the quantizer 105 thereby creating quantized data. The quantized data is variable length encoded in the variable length encoder 106, and compressed encoded data is thereby created. The compressed encoded data is, in order to be transmitted at a desired transmission rate, stored in the encoder buffer 113, and thereafter issued.
The data quantized in the quantizer 105 is reproduced in the inverse quantizer 107 and IDCT 108 to produce a predicted image. The reproduced image data is passed to the motion compensation block 109, and a predicted value is calculated and supplied to the subtractor 103. The motion detector 111 calculates the motion vector in every macro block, and the motion vector is supplied to the motion compensation block 109, and is also supplied to the variable length encoder 106. The quantizer control block 112 compares the number of generated bits in the bit stream transmitted from the variable length encoder 106 and a target number of generated bits converted from a target bit rate, and controls the quantizing width of the quantizer 105 so that encoding is finally completed with the target number of bits.
Processing in the quantizer control block 112 is described below. The target number of bits per GOP converted from the target bit rate is G, the number of bits left over in this GOP in the process of encoding is R, the number of generated bits of the image of I, P, and B pictures encoded immediately before are respectively SI, SP, SB, and the averages of quantizing parameters at this time are respectively QI, QP, QB. Thus, the difficulty in encoding each picture XI, XP, XB is respectively defined as XI=SI.times.QI, XP=SP.times.QP, XB=SB.times.QB, and the target number of bits for encoding each picture is calculated, with respect to I, P and B pictures, as follows: ##EQU1## where Kp, Kb are constants, and NP, NB are the numbers of remaining P pictures and B pictures not yet encoded. The value of R is updated at R=R-S, supposing the number of bits generated in the picture to be S, and is updated at R=R+G at the beginning of the GOP. That is, the number of bits generated per GOP is determined, bits are assigned and encoded depending on the composition ratio of each picture, the number of generated bits is determined in each picture, the value is subtracted from R, the target number of generated bits is corrected and assigned in each picture again, and the same procedure repeats. Further, when the number of bits required for encoding one GOP is different from the target number of bits assigned in the GOP, either one is assigned in the target number of generated bits in the next GOP.
The method of controlling the quantizing parameters from the target number of generated bits of each picture is described below. First, virtual buffers are assumed for I, P and B pictures, and supposing the target number of generated bits in each macro block is constant when encoding an i-th macro block, the data remainders of the virtual buffers dIi, dPi, dBi are expressed as follows. EQU dIi=dI0+B.sub.i-1 -TI.times.(i-1)/MB.sub.-- cnt (4) EQU dPi=dP0+B.sub.i-1 -TP.times.(i-1)/MB.sub.-- cnt (5) EQU dBi=dB0+B.sub.i-1 -TB.times.(i-1)/MB.sub.-- cnt (6)
where Bi is the number of generated bits in all preceding macro blocks including i, MB.sub.-- cnt is the number of macro blocks contained in one picture, and dI0, dP0, dB0 are initial values of buffer remainders at the beginning of each picture. In these formulas, the second term, i.e., B.sub.i-1, refers to the number of bits required to encode up to the immediately preceding macro block, and the third term, i.e., T.sub.i .times.(i-1)/MB.sub.-- cnt, expresses the target number of bits required to encode up to the immediately preceding macro block. Therefore by calculating the difference between the second term and third term, the error between the number of bits actually required for encoding and the target number of bits is obtained. By adding this error to the initial value of the buffer remainder, the buffer remainder for encoding the i-th macro block is obtained.
Using the buffer remainder calculated in the above formulas, the quantizing parameter Qi in the i-th macro block is obtained as follows. ##EQU2## where r=2.times.(target bit rate)/(picture rate).
Summarizing then regarding a target bit rate, the target number of generated bits is set in a picture unit, and the number of generated bits is limited to the target number of the bits in the GOP unit in which the picture is included. Then, for each macro block, depending on the error between the actual number of generated bits and the target number of bits, assuming that the number of generated bits is constant, the quantizing parameter is controlled. As a result, the image is encoded so that the code generation amount may be close to the target number of generated bits for each picture.
In such an encoding method, however, the number of generated bits is set in the picture unit, and the quantizing parameter is controlled so that the number of generated bits coincides with the target number of generated bits in the GOP unit in which the picture is included. Therefore when the picture changes suddenly and the number of bits required for encoding increases, the quantizing parameter is controlled to increase the quantizing width so as to suppress the target number of bits. As a result, the picture quality may deteriorate.
Or, if the actual number of generated bits increases considerably from the target number of generated bits in a certain GOP, the quantizing parameter is controlled so that the target number of generated bits in next GOP may absorb this error, and the picture quality in that GOP may likewise deteriorate.
Yet, when relatively simple images are concentrated in a first half of a certain picture and complicated images are concentrated in a second half thereof, it is ideal to assign more bits to the second half. In the above method, however, the target number of bits is uniformly assigned in each macro block, and the quantizing parameter is set depending on the error from the number of bits required actually in each macro block, and hence there is a possibility of assigning more bits in the first half and occurrence of more bits than expected in the second half. Furthermore, if complicated images are concentrated in the first half of a picture, the number of bits generated in the first half is limited, and bits are not ideally assigned.
To overcome the problems described above, it is possible to encode once all images, determine the ideal bit distribution for all those images, and encode again. However, it is difficult to provide such encoding in real time.