In image compression methods, represented by Moving Pictures Expert Group (MPEG) standards, including combination of motion compensation and discrete cosine transform (DCT), rate control is performed so that a bitstream is transmitted to a transmission line at a desired rate.
FIG. 4 illustrates an MPEG layer structure. MPEG defines a layer structure including a sequence layer, a GOP (Group of Picture) layer, a picture layer, a slice layer, and a macroblock (MB) layer. The actual code size for a certain time period can be adjusted by changing a quantization step by adjusting data (Q scale) included in the slice layer and the MB layer. It is so designed that a larger Q scale sets a larger quantization step. In the following description, a quantization step is denoted by Q.
FIG. 5 is a graph illustrating a relationship between a generated code size and a quantization step. In the illustrated example, the quantization step takes 31 values, i.e., Q1 to Q31. As illustrated in FIG. 5, the actual code size is substantially inversely proportional to the quantization step. This means that a larger quantization step leads to rougher quantization and consequently makes the generated code size small. In contrast, a smaller quantization step leads to finer quantization and consequently makes the actual code size larger.
Rate control is performed for each specific length-equalized unit (picture group of a specific number of pictures) so that the total of generated code sizes (total code size) is to be equal to a specific encoded size.
A known example of such a rate control method is a control method in MPEG-2 Test Model 5 (referred to as “TM5” below).
The rate control in TM5 is described below.
In this control, first, a target code size of a target picture group (a picture group of a single length-equalized unit being a current target of encoding) is allocated to each of the pictures of the target picture group. In the following description, the target code size of a picture group of a single length-equalized unit is denoted by “Sg”, and the target code size of a single picture is denoted by “Sp”.
As illustrated in FIG. 6, larger target code sizes Sp are assigned to an I picture and P pictures, which are used as reference images, whereas target code sizes Sp that are smaller than those assigned to the I picture and the P pictures are assigned to B pictures, which are not used as reference images.
Subsequently, the pictures of the target picture group is encoded by repeating encoding of a target picture and reassignment of the target code sizes Sp to the remaining pictures of the target picture group according to the generated code sizes of the encoded pictures.
A quantization step Q of a target picture is calculated according to Equation (1) using an estimated complexity Xest.
A complexity is an index indicating the difficulty of encoding an encoding target. In this technology, a complexity Xest of a target picture is estimated to be the product of the generated code size of encoded pictures of the same picture type as that of the target picture and the average quantization step.Q=Xest/Sp  (1)
Encoding a picture is performed separately for each macroblock. Specifically, the quantization step Q calculated according to Equation (1) is used for the first macroblock of a target picture. The quantization step Q is used for each of the second and subsequent macroblocks by being adjusted on the basis of the difference between the total of the generated code sizes of the encoded macroblocks and the target code size Sp so that the total code size of the target picture is to be closer to the target code size Sp.
Since the quantization step takes discrete values as illustrated in FIG. 5, the quantization step Q calculated according to Equation (1) does not necessarily completely match any of the values Q1 to Q31. For example, when the quantization step Q calculated according to Equation (1) takes a value between Q5 and Q6, to make sure that the total code size of the concerning picture does not exceed the target code size Sp, Q6 which is a larger value is selected as the quantization step Q. This applies also to the cases to be given below, although not mentioned repeatedly.
Upon completion of encoding the target picture, in this technology, the target code sizes Sp are modified by reallocating, on the basis of the generated code sizes, the target code size to the pictures, in the target picture group, that are not encoded yet so that the total code size of the target picture group is to be substantially the same as the target code size Sg.
Then, the complexity Xest is estimated for the next target picture. The quantization step Q is calculated for the picture by using the estimated complexity Xest and the target code size Sp, which is obtained through the modification, of the target picture, and the picture is encoded.
As described above, in this rate control technology, feedback is performed on the current average rate on the basis of the relationship between the past quantization steps and the total code size. As a consequence of this control, while the average code size (average rate) per unit time fluctuates in a short span, the average code size per unit time is made approximate to the target code size in a long span.
In the case of applications, such as a videotape recorder (VTR), expecting editing or other processing, operation, for example, editing by the frame, is performed, and hence requiring that the code size be controlled in such a manner as to reliably prevent the total code size of a unit of editing from exceeding a certain size. To make sure that the target code size does not exceed the certain size in the case of such applications, it is necessary to leave a margin when setting a target code size for each picture (i.e. to set a target code size at a smaller value).
In the above-described rate control technology, concerning a comparison between macroblocks in a single picture, when the actual generated code size of prior macroblocks are larger than the estimated code size of the prior macroblocks, the code size to be allocated to latter macroblocks is reduced in order to bring the generated code size to be approximated to the estimated code size of the picture (corresponding to the target code size Sp). This consequently deteriorates the image quality of the latter macroblocks. In addition, due to the variation in quantization step between the macroblocks, areas for which rough quantization is performed stand out, which deteriorates subjective image quality.
Concerning a comparison between pictures in a picture group of a single length-equalized unit deteriorate, when the actual generated code size of prior pictures are larger than the estimated code size of the prior pictures, the code size to be allocated to latter pictures is reduced. This consequently deteriorates image quality of the latter pictures. The reason why the code size to be allocated to the latter pictures is reduced is to make the actual generated code size of the picture group be approximated to the target code size Sg of the picture group. In addition, due to the variation in quantization step between the pictures, pictures for which rough quantization is performed stand out, which deteriorates subjective image quality.
A quantization step obtained for a macroblock by the above-described technology is referred to as a “base quantization step” in order to differentiate this quantization step from a quantization step obtained for a macroblock through “adaptive quantization” to be described later. Specifically, a base quantization step is the quantization step Q calculated according to Equation (1) for the first macroblock of a target picture, and is obtained for each of the second and subsequent macroblocks by adjusting the quantization step Q on the basis of the generated code sizes of the encoded macroblocks.
TM5 employs an “adaptive quantization” technology that improves the image quality by weighting the above-described base quantization step according to human visual characteristics and using the weighted base quantization step for quantization of the macroblock. In this technology, the base quantization step is weighted by using an activity as a parameter indicating visual characteristics so that a part likely to stand out if the image quality of the part is deteriorated (e.g., a smooth part) is subjected to finer quantization whereas a part less likely to stand out (e.g., a part corresponding to a complicated pattern) is subjected to rougher quantization. This weighting improves subjective image quality compared to other cases with the same or similar code size. Since the above change in quantization step is made according to visual characteristics, it is possible to prevent deterioration of subjective image quality due to variation in quantization step as in the above-described case.
An activity, which is used as a parameter indicating visual characteristics in adaptive quantization, is known to be calculated, for example, by using the variant of luminance values in a block as a feature quantity as in TM5. Specifically, in TM5, a quantization step is determined for each macroblock by using a concerning activity as follows.
First, according to Equation (2), an activity act_j of a macroblock (MBj), which consists of 16×16 pixels, is obtained by use of the minimum value of variances vblk_n (1<=n<=4), which are variances of luminance values in the four sub-blocks, each of which consists of 8×8 pixels, in the macroblock.act_j=1+min(vblk_1,vblk_2,vblk_3,vblk_4)  (2)
According to Equation (3), a normalized activity Nact_j of the macroblock MBj is obtained by use of an average value avg_act of the activities act_j of all the macroblocks in the single picture.Nact_j=(2×act_j+avg_act)/(act_j+2×avg_act)  (3)
According to Equation (4), a quantization step Qact_j, based on adaptive quantization, of the macroblock MBj is calculated. In Equation (4), Qbase denotes the above-described base quantization step, and Nact_j denotes a normalized activity of the macroblock MBj and is calculated according to Equation (3).Qact_j=Qbase×Nact_j  (4)
To achieve high subjective image quality according to visual characteristics, it is considered to be preferable to satisfy both of the following conditions: differences among the quantization steps Qact of the respective macroblocks depend on the activities; and the same quantization step is used for the macroblocks having the same activity.
Although an activity is calculated on the basis of the variance of luminance values in the above-described technology, it is also known that an activity may be calculated on the basis of a luminance component and chrominance components as disclosed in PTL 4, for example.
However, in adaptive quantization, a quantization step to be used for a macroblock is changed from Qbase to Qact, and this change increases the divergence between generated code size and estimated code size in comparison with the case not employing adaptive quantization. For this reason, when generated code size is larger than estimated code size, code size to be allocated to the subsequent areas (macroblocks or pictures) is reduced, consequently deteriorating the image quality of the subsequent areas. In contrast, when generated code size is smaller than estimated code size, the image quality of the area is lowered by a degree according to the code size left unused.
PTL 1 discloses a technology for preventing this problem. In this technology, the activity of a target macroblock is weighted according to the difference between a size of generated code stored in a virtual buffer and an estimated code size, at the time of determining a quantization step for the target macroblock. Specifically, a larger difference between the generated code size and the estimated code size sets a smaller weight to the activity of the target macroblock (e.g., second exemplary embodiment). The quantization step for the macroblock is determined by using the weighted activity in Equation (4) instead of Nact_j.
Using this technique brings the total code size closer to the target code size by weighting the activity of the target macroblock according to the difference between the generated code size and the estimated code size. However, when the weight of the activity is changed, macroblocks having the same activity are encoded using different quantization steps. This reduces the effect of adaptive quantization, consequently deteriorating image quality.
PTL 2 discloses a technology for encoding areas having the same activity, by use of the same quantization step. While encoding a single picture, in this technology, the base quantization step Qbase in Equation (4) is not changed, and the same value is used for the macroblocks. In this way, the same quantization step Qact_j is used for the macroblocks having the same activity Nact_j in the single picture.
PTL 2 describes that, since the generated code size does not agree with the target code size in this technology, it is preferable to set a target code size with a margin of 20%.
PTL 3 discloses a technology of increasing the accuracy in estimation of a complexity when controlling rate control on the basis of the estimated complexity. In the method according to TM5, the generated code size of a target picture is estimated by assuming, as the complexity Xest of the target picture, the product of a generated code size of a past picture and the average quantization step. Accordingly, when the target picture and the past picture have a low correlation, for example, when the pattern of the target picture is significantly different from that of the past picture, or when a scene is changed to a different scene, the generated code size for the target picture is estimated significantly wrong, and distribution of the code size becomes inaccurate, which consequently causes a problem that encoding efficiency is deteriorated.
According to the technology disclosed in PTL 3, as described, for example, in the paragraphs [0027] to [0032], the complexity Xest is estimated for the I picture to be encoded first after a scene change, on the basis of the variance of the picture, and the complexity Xest is estimated for a picture that is the P picture or the B picture to be encoded first after a scene change, on the basis of an estimation error value obtained from a result of a motion vector search between the picture and a reference picture of the picture. For the estimation of the complexity Xest of each of these pictures, Equation (5) described below is used.Xest=a×F2+b×F+c  (5)
In Equation (5), a, b, and c are fixed values set for each picture type. A feature quantity F is, in the case of an I picture, the variant of the picture, and in the case of a P picture or a B picture, an estimation error value obtained from the result of the motion vector search between the picture and a reference picture of the picture.
For the pictures not corresponding to any of the above-described pictures, a complexity is estimated on the basis of the generated code sizes of the encoded pictures of the same type and the average quantization step, as in the technology according to TM5.
PTL 3 also discloses that the relationship between the complexity estimated according to Equation (5) and the complexity obtained from an encoding result (the product of the generated code size of the encoded pictures and the average quantization step) is stored separately for picture types for past several pictures, and the coefficients in Equation (5) are modified separately for the picture types on the basis of the stored data. This process increases the accuracy in estimation of a complexity.