FIG. 1 is a block diagram showing a configuration of a conventional typical encoding apparatus for encoding a moving picture signal.
The encoding apparatus shown in FIG. 1 includes a local decoding apparatus, and is provided with frequency conversion unit 101, quantization unit 102, variable length encoding unit 103, dehumanization unit 104, inverse frequency conversion unit 105, frame memory 106, intra-frame prediction unit 107, motion compensation unit 108, motion estimation unit 109, buffer 110, and code amount control unit 111. The encoding apparatus further includes subtractor 121, switch 122, and adder 123.
An input picture frame is fed into the encoding apparatus and is divided into a plurality of blocks. A predicted value by an intra-frame prediction or an inter-frame prediction is subtracted from the divided block by subtractor 121. Here, the intra-frame prediction is a technique for predicting a current picture by using a reconstructed area of the current frame to be encoded, and the inter-frame prediction is a technique for predicting a current picture by using a picture frame that is previously reconstructed. The picture block in which the predicted value by the intra-frame prediction or the inter-frame prediction is subtracted is called a prediction error. Incidentally, a picture frame in which all blocks in a frame to be encoded are encoded by only the intra-frame prediction for producing predicted values from adjacent pixels in the same frame to be encoded, is called an I-picture. A picture frame coded by using both the intra-frame prediction and the inter-frame prediction is called a P-picture. Further, in the inter-frame prediction, a picture frame coded by referring to a plurality of picture frames, which are input before and after the current encoded frame, is called a B-picture.
Generally, in the moving picture data to be encoded, I-pictures are set at constant intervals, and a section that is separated by the I-picture and includes a plurality of frames is called a GOP (Group Of Picture). Definitions of the I-, P-, and B-pictures and the GOP are stipulated by the MPEG (Motion Picture Expert Group), which is the moving picture encoding standard in the international standardization.
Then, the prediction error is converted into a frequency domain by frequency conversion unit 101. The prediction error converted into the frequency domain is quantized by quantization unit 102. The quantized prediction error, i.e., a conversion coefficient is entropy-encoded by variable length encoding unit 103 and is stored in buffer 110. Buffer 110 outputs the stored occurring code, i.e., a bitstream with predetermined timing. Also, the quantized prediction error is returned to the prediction error of the original spatial domain again as a local decoding process by dequantization unit 104 and inverse frequency conversion unit 105. Further, the predicted value is added to the prediction error returned to the spatial domain by adder 123, and the prediction error is stored in frame memory 106 as a reconstructed picture.
The reconstructed picture that has been stored in frame memory 106 is referred by intra-frame prediction unit 107, motion compensation unit 108 and motion estimation unit 109 to produce the predicted value. Therefore, the reconstructed picture stored in frame memory is also called a reference frame.
Intra-frame prediction unit 107 performs the intra-prediction in accordance with the restructured picture in frame memory 107, and outputs the predicted value. Motion estimation unit 109 detects the motion vector of the input block and the reference frame in accordance with the input picture block and the reference frame read from frame memory 106 to minimize a difference between the input block and the predicted value, i.e., the prediction error. Motion compensation unit 108 produces the predicted value from the reference frame stored in frame memory 106 by using the motion vector and the reference frame supplied from motion estimation unit 109. The predicted value by motion compensation unit 109 is based on the inter-frame prediction. Therefore, switch 122 is arranged to switch the predicted value output from intra-frame prediction unit 107 and the predicted value output from motion compensation unit 108 to be supplied to subtractor 121.
The bitstream, which is the moving picture information compressed in the above process, is made of a variable length code, mainly including a conversion coefficient of each block, a quantizing parameter, a motion vector (to minimize the prediction error), and a reference frame (to minimize the prediction error).
The above is the principle operation in the moving picture compression technique.
Now, generally, in a digital broadcast system or a picture communication service, the moving picture signal is controlled about the occurring code amount, i.e., the bit rate thereof for transmission/storage. Then, code amount control unit 111 detects the occurring code amount supplied from variable length encoding unit 103 and performs the two processes, as described later, to control the occurring code amount.
In the first process, code amount control unit 111 sets a target code amount for each frame in accordance with each picture type. When R is a code amount assigned to a frame that has not been encoded in GOP, Np and Nb are numbers of P-pictures and B-pictures, which have not been encoded in GOP, respectively, Xi, Xp, Xb are parameters, each of which represents frame complexity of each picture, defined by equations (1) to (3), and Kp and Kb are parameters in consideration of subjective picture qualities by picture type, target code amounts Ti, Tp, Tb by picture types are then given by equations (4) to (6).Xi=Qi×Ci  (1),Xp=Qp×Cp  (2),Xb=Qb×Cb  (3),Ti=R/(1+Np×Xp/(Kp×Xi)+Nb×Xb/(Kb×Xi))  (4),Tp=R/(Np+Nb×Kp×Xb/(Kb×Xp))  (5), andTb=R/(Nb+Np×Kb×Xp/(Kp×Xb))  (6).
Where Ci, Cp, Cb are respectively occurring code amounts of I-, P-, B-pictures that are finally coded, and Qi, Qp, Qb are respectively average quantization step sizes of I-, P-, B-pictures that are finally coded. Incidentally, in the following explanations, for convenience in writing, one value among Ci, Cp, Cb is expressed as Ci,p,b. Also, the expression of Qi,p,b=Xi,p,b/Ti,p,b collectively represents the equation of Qi=Xi/Ti in relation to the I-picture, the equation of Qp=Xp/Tp in relation to the P-picture, and the equation of Qb=Xb/Tb in relation to the B-picture.
Whenever each frame is encoded in accordance with the first process and the second process, which will be described later, code amount R assigned to a frame that has not been encoded in GOP is updated in accordance with equation (7).R=R−Ci,p,b  (7).
Further, when a head picture of GOP is encoded, code amount R is initialized by equation (8).R=bit_rate×N/frame_rate+R  (8).
Where, bit_rate is a target bit rate, frame_rate is a frame rate, and N is the number of frames in GOP.
In the second process, in order to coincide code amounts Ti, Tp, Tb assigned to respective frames, obtained in the first process, with actual occurring code amounts, quantization steps are obtained by feedback control in macroblock units in accordance with the virtual buffer capacity that is set by each picture type.
First, prior to encoding of the j-th macroblock, occupation amounts in the virtual buffer are obtained by picture types in accordance with equation (9).di,p,b(j)=di,p,b(0)+B(j−1)−Ti,p,b×(j−1)/MBcount  (9).
di,p,b(0) is an initial occupation amount in the vertical buffer, B(j) is an occurring code amount from a head to the j-th macroblock in the frame, and MBcount is the number of macroblocks in the frame.
When encoding of each frame is finished, initial occupation amounts di,p,b(MBcount) in the vertical buffer by picture types are used as initial occupation amounts di,p,b(0) in the vertical buffer relative to the next picture.
Then, quantization step size Q(j) relative to the j-th macroblock is calculated in accordance with equation (10).Q(j)=Qi,p,b×di,p,b(j)×31/(10×r)  (10), andQi,p,b=Xi,p,b/Ti,p,b   (11).
Where r is a parameter used to control the response speed of the feedback loop, which is called a reaction parameter, and is represented by equation (12).r=2×bit_rate/frame_rate  (12).
Incidentally, initial occupation amounts di,p,b(0) in the virtual buffer at the start of encoding are represented by equations (13) to (15).di(0)=10×r/31  (13),dp(0)=Kp×di(0)  (14), anddb(0)=Kb×di(0)  (15).
On the other hand, a moving picture encoding scheme is proposed in which a multi-frame motion prediction that enables the motion prediction is incorporated, that is, the P-picture is predicted from not only the I-picture or the P-picture that is just previously encoded but also the frame that is already encoded and the B-picture is predicted from not only the I-picture or the P-picture that is just previously encoded but also the B-picture that is already encoded. In this scheme, since a high-quality frame that is previously encoded is selected, and then the motion prediction can be performed degree of freedom in the motion prediction increases.