1. Field of the Invention
The present invention relates to an encoding apparatus, a method of controlling thereof, and a computer program.
2. Description of the Related Art
Along the recent expansion of multimedia, various moving image compression encoding methods have been proposed. Typical examples are MPEG-1, 2, and 4, and H.264. In the compression encoding process, an original image (image) contained in a moving image is divided into predetermined regions called blocks, and motion compensation/prediction and DCT transformation are executed for each of the divided blocks. For motion compensation/prediction, a reference image is obtained by locally decoding already encoded image data. For this reason, a decoding process is necessary even in encoding.
When an image is compressed and encoded conforming to MPEG, the amount of code often largely changes depending on the spatial frequency characteristic that is the chracteristic of an image itself, a scene, and a quantization scale value. An important technique that allows obtaining a decoded image with high image quality upon implementing an encoding apparatus having such encoding characteristics is code amount control.
As one of code amount control algorithms, TM5 (Test Model 5) is generally used. The TM5 code amount control algorithm includes three steps to be described below. The amount of code is controlled in the following three steps to ensure a predetermined bit rate in every GOP (Group Of Picture).
(Step 1)
The target amount of code of a picture to be encoded next is determined. Rgop that is the usable amount of code in the current GOP is calculated byRgop=(ni+np+nb)*(bits_rate/picture_rate).  (1)where ni, np, and nb are the numbers of remaining I-, P-, and B-pictures in the current GOP, bits_rate is the target bit rate, and picture_rate is the picture rate.
Complexities Xi, Xp, and Xb of the I-, P-, and B-pictures are obtained based on the encoding results byXi=Ri*Qi Xp=Rp*Qp Xb=Rb*Qb  (2)where Ri, Rp, and Rb are amounts of code obtained by encoding the I-, P-, and B-pictures, Qi, Qp, and Qb are the average values of the Q-scale in all macroblocks in the I-, P-, and B-pictures. Based on equations (1) and (2), target amounts Ti, Tp, and Tb of code of the I-, P-, and B-pictures are obtained byTi=max{(Rgop/(1((Np*Xp)/(Xi*Kp))+((Nb*Xb)/(Xi*Kb)))),(bit_rate/(8*picture_rate))}Tp=max{(Rgop/(Np+(Nb*Kp*Xb)/(Kb*Xp))),(bit_rate/(8*picture_rate))}Tb=max{(Rgop/(Nb+(Np*Kb*Xp)/(Kp*Xb))),(bit_rate/(8*picture_rate))}  (3)where Np and Nb are the numbers of remaining P and B pictures in the current GOP, and constants Kp=1.0 and Kb=1.4.
(Step 2)
Three virtual buffers are used for the I-, P-, and B-pictures, respectively, to manage the differences of the target amounts of code obtained by equations (3) and the amounts of generated code. The data accumulation amount of each virtual buffer is fed back, and the Q-scale reference value is set, based on the data accumulation amount, for a macroblock to be encoded next so that the actual amount of generated code becomes closer to the target amount of code. For example, if the current picture type is P-picture, the difference between the target amount of code and the amount of generated code can be obtained by an arithmetic process based ondp,j=dp,0+Bp,j−1−((Tp*(j−1))/MB—cnt)  (4)where the suffix j is the macroblock number in the picture, dp,0 is the initial fullness of the virtual buffer, Bp,j is the total amount of code up to the jth macroblock, and MB_cnt is the number of macroblocks in the picture. The Q-scale reference value in the jth macroblock is obtained using dp,j (to be referred to as “dj” hereinafter) byQj=(dj*31)/r  (5)for r=2*bits_rate/picture_rate  (6)
(Step 3)
A process of finally determining the quantization scale based on the spatial activity of the encoding target macroblock to obtain a satisfactory visual characteristic, that is, a high decoded image quality is executed.ACTj=1+min(vblk1,vblk2, . . . , vblk8)  (7)where vblk1 to vblk4 are spatial activities in 8×8 sub-blocks in a macroblock with a frame structure, and vblk5 to vblk8 are spatial activities of 8×8 sub-blocks in a macroblock with a field structure. The spatial activity can be calculated byvblk=Σ(Pi−Pbar)2  (8)Pbar=( 1/64)*ΣPi  (9)where Pi is a pixel value in the ith macroblock, and Σ in equations (8) and (9) indicates calculations for i=1 to 64. ACTj obtained by equation (7) is normalized byN_ACTj=(2*ACTj+AVG_ACT)/(ACTj+AVG_ACT)  (10)where AVG_ACT is a reference value of ACTj in the previously encoded picture, and the quantization scale (Q-scale value) MQUANTj is finally calculated byMQUANTj=Qj*N_ACTj  (11)
According to the above-described TM5 algorithm, the process in STEP 1 assigns a large amount of code to I-picture. A large amount of code is allocated to a flat region (with low spatial activity) where degradation is visually noticeable in the picture. This enables code amount control and quantization control within a predetermined bit rate while suppressing the degradation in image quality.
There is also proposed a method of controlling quantization in accordance with the characteristic of an image, like TM5, thereby improving the visual characteristic (Japanese Patent Laid-Open No. 11-196417).
The above-described TM5 method extracts a characteristic from each macroblock and performs adaptive quantization while changing the quantization parameter based on the characteristic so that quantization control is performed to achieve a predetermined target amount of code.
In Japanese Patent Laid-Open No. 11-196417, if the number of blocks which have high complexities and require raising the quantization parameter is small, the degradation of a block having a high complexity becomes conspicuous although the amount of generated code increases. Hence, adaptive quantization is inhibited. This also applies to a case in which the number of blocks which have low complexities and require to lower the quantization parameter is large.