1. Field of the Invention
The present invention relates to an image encoding apparatus, a method of controlling the same, and a computer program.
2. Description of the Related Art
With the recent expansion of multimedia, various moving image compression encoding methods have been proposed. Typical examples are MPEG (Moving Pictures of Experts Group)-1, 2, and 4, and H.264. In the compression encoding process, an original picture (picture) contained in a moving image is divided into predetermined regions called blocks, and motion compensation/prediction and DCT (Discrete Cosine Transform) transform are executed for each of the divided blocks. For motion compensation/prediction, a reference picture is obtained by locally decoding already encoded picture data. For this reason, a decoding process is necessary even in encoding.
When a picture is compressed and encoded in conformance to MPEG, the code amount often largely changes depending on the spatial frequency characteristic that is the chracteristic of a picture itself, a scene, and a quantization scale value. An important technique that allows obtaining a high-quality decoded picture upon implementing an encoding apparatus having such encoding characteristics is code amount control.
As one of code amount control algorithms, TM5 (Test Model 5) is generally used. The TM5 code amount control algorithm includes three steps to be described below. The amount of code is controlled in the following three steps to ensure a constant bitrate in each GOP (Group Of Pictures).
(Step 1)
The target code amount of a picture to be encoded next is determined. An available code amount Rgop in the current GOP is calculated byRgop=(ni+np+nb)*(bits_rate/picture_rate)   (1)where ni, np, and nb are the numbers of remaining I-, P-, and B-pictures in the current GOP respectively, bits_rate is the target bit rate, and picture_rate is the picture rate.
Complexities Xi, Xp, and Xb of the I-, P-, and B-pictures are obtained based on the encoding results byXi=Ri*Qi Xp=Rp*Qp Xb=Rb*Qb   (2)where Ri, Rp, and Rb are amounts of code obtained by encoding the I-, P-, and B-pictures respectively, and Qi, Qp, and Qb are the average values of the Q-scale in all macroblocks in the I-, P-, and B-pictures respectively. Based on equations (1) and (2), target amounts Ti, Tp, and Tb of code of the I-, P-, and B-pictures respectively are obtained byTi=max{(Rgop/(1+((Np*Xp)/(Xi*Kp))+((Nb*Xb)/(Xi*Kb)))), (bit_rate/(8*picture_rate))}Tp=max{(Rgop/(Np+(Nb*Kp*Xb)/(Kb*Xp))), (bit_rate/(8*picture_rate))}Tb=max{(Rgop/(Nb+(Np*Kb*Xp)/(Kp*Xb))), (bit_rate/(8*picture rate))}  (3)where Np and Nb are the numbers of remaining P- and B-pictures in the current GOP respectively, and constants Kp=1.0 and Kb=1.4.
(Step 2)
Three virtual buffers are used for the I-, P-, and B-pictures, respectively, to manage the differences between the target code amounts obtained by equations (3) and the amounts of generated code. The data accumulation amount of each virtual buffer is fed back, and the Q-scale reference value is set based on the data accumulation amount for a macroblock to be encoded next so that the actual amount of generated code becomes closer to the target code amount. For example, if the current picture type is P-picture, the difference between the target code amount and the amount of generated code can be obtained by an arithmetic process based ondp,j=dp,0+Bp,j−1−((Tp*(j−1))/MB—cnt)   (4)where the suffix j is the macroblock number in the picture, dp,0 is the initial fullness of the virtual buffer, Bp,j is the total code amount up to the jth macroblock, and MB_cnt is the number of macroblocks in the picture.
The Q-scale reference value in the jth macroblock is obtained using dp,j (to be referred to as “dj” hereinafter) byQj=(dj*31)/r   (5)for r=2*bits_rate/picture_rate   (6)
(Step 3)
A process of finally deciding the quantization scale based on the spatial activity of the encoding target macroblock to obtain a satisfactory visual characteristic, that is, a high decoded picture quality is executed.ACTj=1+min(vblk1, vblk2, . . . , vblk8)   (7)where vblk1 to vblk4 are spatial activities in 8×8 subblocks in a macroblock with a frame structure, and vblk5 to vblk8 are spatial activities of 8×8 subblocks in a macroblock with a field structure. The spatial activity can be calculated byvblk=Σ(Pi−Pbar)2   (8)Pbar=( 1/64)*ΣPi   (9)where Pi is the pixel value in the ith macroblock, and Σ in equations (8) and (9) indicates operations for i=1 to 64. ACTj obtained by equation (7) is normalized byN_ACTj=(2*ACTj+AVG_ACT)/(ACTj+AVG_ACT)   (10)where AVG_ACT is a reference value of ACTj in the previously encoded picture, and the quantization scale (Q-scale value) MQUANTj is finally calculated byMQUANTj=Qj*N_ACTj   (11)
According to the above-described TM5 algorithm, the process in STEP 1 assigns a large code amount to I-picture. A large code amount is allocated to a flat region (with low spatial activity) where degradation is visually noticeable in the picture.
As an encoding method to which TM5 is applied, there is proposed a method of determining a target code amount so that the SN ratio of a picture signal and locally decoded picture takes a constant value (see Japanese Patent Laid-Open No. 02-219388). The proposed method can stabilize the quality of all pictures by setting a target code amount which keeps the SN ratio constant.
As an improvement of the proposed method, a method of setting the code amounts of I-, P-, and B-pictures to optimum values is proposed (see Japanese Patent Laid-Open No. 08-070458). According to this improved method, it is controlled to allocate the code amounts of respective frames (I-, P-, and B-pictures) so that the SN ratio of I-picture becomes higher than that of B-picture. That is, the code amounts of respective frames (I-, P-, and B-pictures) are controlled to set the encoding error of I-picture smaller than that of B-picture. This can improve the quality of I-picture serving as the main picture of the GOP.
An encoding method using the difference between frames is also proposed (see Japanese Patent Laid-Open No. 2005-354528). According to the proposed method, a global vector (GV) serving as the motion vector between a global current picture and a global reference picture is obtained. The macroblocks of the current picture are searched within a search region determined based on the GV reliability, detecting a motion vector. According to this method, the correlation between frames is obtained as a reliable value GRV, and the position of the search window in motion search is determined based on the reliable value GRV.
The method proposed in Japanese Patent Laid-Open No. 02-219388 can maintain a certain picture quality by keeping the SN ratio constant between pictures. The method proposed in Japanese Patent Laid-Open No. 08-070458 can also maintain a certain picture quality by considering the SN ratio and the code allocation of each picture.
However, these methods use the SN ratio as information for determining a target code amount, and do not fully consider the degree of quantitative degradation of the picture quality and the human visual characteristic. The SN ratio and picture quality may not always be proportional to each other.
For example, the SN ratio hardly greatly decreases even upon degradation of the picture quality in a picture formed from signals containing few high-frequency components in high-speed panning. However, visually conspicuous noise is readily generated in such a picture. As for a static picture, noise stands out even at the same SN ratio as that of other pictures because the picture does not move. Thus, the quality of the static picture cannot be regarded to be equal to that of other pictures. Hence, the proposed methods can neither keep the SN ratio constant to determine a target code amount, nor set a code amount which matches the human visual characteristic.
In this way, the proposed methods cannot set a code amount which matches the human visual characteristic, failing to obtain a high-quality decoded picture.
The proposed methods determine the code amount using not a picture before encoding but a picture after encoding. Consider a case in which an abrupt change occurs in a picture in which high-frequency components greatly increase upon the stop of a camera from a picture containing few high-frequency components in camera panning or the like, or a picture in which some object appears in the frame. In this case, the SN ratio greatly decreases upon encoding, and noise such as block noise readily occurs. Even if one tries to keep the SN ratio constant and determine a target code amount, it is difficult to determine an optimum code amount which does not generate noise. When the picture changes, the proposed methods cannot set an optimum code amount, which does not generate noise, so as to obtain a high-quality decoded picture.