The present invention relates to an apparatus and a method for image processing, a recording medium, and a program and, particularly, to an apparatus and a method for image processing, a recording medium, and a program that make it possible to realize high-speed processing for converting image information (bit stream) compressed by an orthogonal transform, such as a discrete cosine transform in MPEG (Moving Picture Experts Group) or the like, and motion compensation into image data with a lower bit rate.
Recently, apparatuses based on MPEG and other schemes that handle image data as digital information and compress the image data by an orthogonal transform, such as a discrete cosine transform, and motion compensation using redundancy specific to the image data for the purposes of highly efficient transmission and storage of the information have been spreading for use by broadcasting stations or the like in distributing information or in ordinary households in receiving information.
MPEG2 (ISO/IEC 13818-2), in particular, is defined as a general-purpose image-coding scheme, and is a standard provided for interlaced-scanning images and progressive scanning images as well as standard-resolution images and high-resolution images. MPEG2 is therefore expected to continue to be used in the future in a wide variety of application software for business use by the broadcasting industry and for use by general users. With an MPEG2 compression method, by assigning a code amount (bit rate) of 4 to 8 Mbps to an interlaced-scanning image of standard resolution formed of 720 pixels×480 pixels and a code amount (bit rate) of 18 to 22 Mbps to an interlaced-scanning image of high resolution formed of 1920 pixels×1088 pixels, for example, it is possible to compress the images while maintaining a high compression ratio and good image quality and realize transmission and storage of the compressed images.
However, an image of high resolution has an enormous amount of information. Even with compression using a coding scheme such as MPEG, as described above, a code amount (bit rate) of about 18 to about 22 Mbps or more is required for a 30-Hz interlaced-scanning image of 1920 pixels×1080 pixels, for example, in order to obtain sufficient image quality. Hence, the code amount (bit rate) needs to be further reduced while minimizing degradation in image quality, so as to adjust to the bandwidth of a transmission line when image information compressed by the MPEG method is to be transmitted via a network, such as cable television or satellite broadcasting, for example, and so as to adjust to the capacity of a recording medium when image information compressed by the MPEG method is to be stored (recorded) on a recording medium, such as an optical disk, a magnetic disk, or a magneto-optical disk. Such reduction of the code amount (bit rate) also may be required when compressed image information (bit stream) of an image of standard resolution (for example a 30-Hz interlaced-scanning image of 720 pixels×480 pixels) as well as an image of high resolution is to be transmitted via a network or recorded on a recording medium as described above.
As means for solving such a problem, there are methods such as hierarchical coding (scalability) processing and image information converting (transcoding) processing. In relation to the former method, SNR (Signal to Noise Ratio) scalability is standardized in MPEG2 to thereby enable the hierarchical coding of high-SNR compressed image information (bit stream) and low-SNR compressed image information (bit stream). However, although the hierarchical coding requires that the restraining condition of the bandwidth of the network medium or the storage capacity of the recording medium be known at the time of the coding, such information is unknown in an actual system in most cases. Thus, the latter may be said to be a method having a higher degree of freedom and suited to an actual system.
The image information converting (transcoding) processing, for example, converts compressed image information compressed by the MPEG2 method into compressed image information with a lower bit rate. In the image information converting processing, information such as a picture coding type, a quantization width in each macroblock, and a quantization matrix is first extracted from the compressed image information compressed by the MPEG2 method. Then, the compressed image information is variable-length-decoded and rearranged into two-dimensional data as quantized discrete cosine transform coefficients. The quantized discrete cosine transform coefficients rearranged in the form of two-dimensional data are then inversely quantized on the basis of the quantization width and the quantization matrix mentioned above. Predetermined high-frequency component coefficients are cut from the inversely quantized discrete cosine transform coefficients. The resulting inversely quantized discrete cosine transform coefficients are requantized with a quantization width (quantization scale code) generated on the basis of a target bit rate (lower than the original bit rate), variable-length-coded again by the MPEG2 method, and then outputted.
The quantization width (quantization scale code) corresponding to the image information compressed by the MPEG2 method is determined by processing, to be explained with reference to the flowchart of FIG. 1, to thereby control the amount of codes. The following description will be made by taking as an example compressed image information compressed by an MPEG2 Test Model 5 (ISO/IEC JTC1/SC 9/WG11N400) method. In this code amount control, a target code amount (target bit rate) and a GOP (Group of Pictures) formation are input variables. The GOP in this case is a group of three picture types: an I (Intra Code) picture (picture coded separately by itself), a P (Predictive Code) picture (picture coded by a temporally previous (past) I-picture or P-picture), and a B (Bidirectionally Predictive Code) picture (picture coded by a temporally previous or subsequent (past or future) I-picture or P-picture) used in image compression by the MPEG2 method.
At a step S1, an amount of bits are allocated to each picture in the GOP on the basis of an amount of bits (hereinafter referred to as an assigned bit amount R) to be assigned to pictures not decoded yet in the GOP including the picture targeted for the allocation. The allocation is repeated in the order of coded pictures in the GOP. In this case, an amount of codes is assigned to each picture using two assumptions described below.
As a first assumption, it is assumed that a product of an average quantization scale code used in coding each picture and an amount of codes generated is constant for each picture type unless the screen is changed. Thus, variables Xi, Xp, and Xb (global complexity measure) indicating the complexity of the screen are updated by the following equations (1) to (3). The relation between the amount of codes generated and the quantization scale code when the next picture is coded is estimated from the parameters.Xi=Si·Qi  (1)Xp=Sp·Qp  (2)Xb=Sb·Qb  (3)where Si, Sp, and Sb denote the amount of code bits generated at the time of coding the picture; and Qi, Qp, and Qb denote an average quantization scale code at the time of coding the picture. Initial values are set as expressed by the following equations (4) to (6) using a target code amount (target bit rate) bit_rate (bits/sec).Xi=160×bit_rate/115  (4)Xp=60×bit_rate/115  (5)Xb=42×bit_rate/115  (6)
As a second assumption, it is assumed that overall picture quality is optimized at all times when the ratios Kp and Kb of the quantization scale codes of a P-picture and a B-picture with respect to the quantization scale code of an I-picture are values defined by equations (7) and (8).Kp=Qp/Qi=1.0  (7)Kb=Qb/Qi=1.4  (8)
Specifically, the quantization scale code of a B-picture is 1.4 times the quantization scale codes of an I-picture and a P-picture at all times. This assumes that when the B-picture is coded somewhat more roughly than the I-picture and the P-picture and an amount of codes thus saved in the B-picture are added to the I-picture and the P-picture, the picture quality of the I-picture and the P-picture is improved and, in turn, the picture quality of the B-picture using the I-picture and the P-picture as a reference is improved.
On the basis of the above two assumptions, bit amounts (Ti, Tp, and Tb) assigned to the pictures in the GOP are values expressed by equations (9) to (11).
                              T          i                =                  max          ⁢                      {                                          R                                  1                  +                                                                                    N                        p                                            ·                                              X                        p                                                                                                            X                        i                                            ·                                              K                        p                                                                              +                                                                                    N                        b                                            ·                                              X                        b                                                                                                            X                        i                                            ·                                              K                        b                                                                                                        ,                              bit_rate                                  8                  ×                  picture_rate                                                      }                                              (        9        )                                          T          P                =                  max          ⁢                      {                                          R                                                      N                    p                                    +                                                                                    N                        b                                            ·                                              K                        p                                            ·                                              X                        b                                                                                                            K                        b                                            ·                                              X                        p                                                                                                        ,                              bit_rate                                  8                  ×                  picture_rate                                                      }                                              (        10        )                                          T          b                =                  max          ⁢                      {                                          R                                                      N                    b                                    +                                                                                    N                        p                                            ·                                              K                        b                                            ·                                              X                        p                                                                                                            K                        p                                            ·                                              X                        b                                                                                                        ,                              bit_rate                                  8                  ×                  picture_rate                                                      }                                              (        11        )            where Np and Nb denote the numbers of P-pictures and B-pictures not coded yet in the GOP. On the basis of the thus obtained assigned code amounts, the assigned bit amount R assigned to pictures not coded yet in the GOP is updated by the following equation (12) each time a picture is coded.R=R−Si,p,b;  (12)
When a first picture in the GOP is coded, the assigned bit amount R is updated by an equation (13).
                    R        =                                            bit_rate              ×              N                        picture_rate                    +          R                                    (        13        )            where N denotes the number of pictures in the GOP. An initial value of the assigned bit amount R at the start of the sequence is zero.
At a step S2, in order that the bit amounts (Ti, Tp, and Tb) assigned to the pictures obtained by the equations (9) to (11) in the processing of the step S1 coincide with the amounts of codes actually generated, the quantization scale code is obtained by feedback control in macroblock units on the basis of the capacity of three virtual buffers set independently for each picture. In the following description, a macroblock is of a two-dimensional 8×8 formation.
Prior to the coding of a jth macroblock, the occupancy quantity of the virtual buffers is obtained by equations (14) to (16),
                              d          j          i                =                              d            0            i                    +                      B                          j              -              1                                -                                                    T                i                            ×                              (                                  j                  -                  1                                )                                                    MB              cnt                                                          (        14        )                                          d          j          p                =                              d            0            p                    +                      B                          j              -              1                                -                                                    T                p                            ×                              (                                  j                  -                  1                                )                                                    MB              cnt                                                          (        15        )                                          d          j          b                =                              d            0            b                    +                      B                          j              -              1                                -                                                    T                b                            ×                              (                                  j                  -                  1                                )                                                    MB              cnt                                                          (        16        )            where d0i, d0p, and d0b denote the initial occupancy quantity of the virtual buffers for the I-picture, P-picture, and B-picture, respectively; Bj denotes an amount of bits generated from a head to a jth macroblock of a picture; and MBcnt denotes a number of macroblocks within one picture.
The virtual buffer occupancy quantity at the time of the end of the coding of each picture (dMBcnti, dMBcntp, dMBcntb) is used as an initial value (d0i, d0p, and d0b) of the virtual buffer occupancy quantity for a next picture in the same picture type.
Next, the quantization scale code for the jth macroblock is calculated by the following equation (17):
                              Q          j                =                                            d              j                        ×            31                    r                                    (        17        )            where r is a parameter for controlling response speed of a feedback loop, referred to as a reaction parameter. The parameter r is given by the following equation (18):
                    r        =                  2          ×                      bit_rate            picture_rate                                              (        18        )            
Initial values of the virtual buffers at the start of a sequence are given by the following equations (19) to (21):
                              d          0          i                =                  10          ×                      r            31                                              (        19        )            d0p=Kp·d0i  (20)d0b=Kb·d0i  (21)
At a step S3, the quantization scale code obtained by the processing of the step S2 is changed by a variable referred to as activity for each macroblock such that finer quantization is performed in a flat portion where degradation tends to be visually more noticeable and rougher quantization is performed in a portion of a complex pattern where degradation tends to be less noticeable.
The activity is given by the following equations (22) to (24) using pixel values of a luminescence signal of an original image, or pixel values of a total of eight blocks, that is, four blocks in a frame discrete cosine transform mode and four blocks in a field discrete cosine transform mode:
                              act          j                =                  1          +                                                                                                                                                                                          min                                                                                                                        sblk                      =                      1                                        ,                    8                                                                        ⁢                          (                                                var                  —                                ⁢                sblk                            )                                                          (        22        )            
                              [                      Equation            ⁢                                                  ⁢            2                    ]                ⁢                                  ⁢                  var_blk          =                                    1              64                        ⁢                                          ∑                                  k                  =                  1                                64                            ⁢                                                (                                                            P                      k                                        -                                          P                      _                                                        )                                2                                                                        (        23        )            
                              [                      Equation            ⁢                                                  ⁢            3                    ]                ⁢                                  ⁢                              P            _                    =                                    1              64                        ⁢                                          ∑                                  k                  =                  1                                64                            ⁢                              P                k                                                                        (        24        )            where Pk is a pixel value within a block of the luminescence signal of the original image. A minimum value is obtained in the equation (22) because quantization is made finer when there is a flat portion even in a part of the macroblock.
Then, a normalized activity Nactj having a value in a range of 0.5 to 2 is obtained by an equation (25).
                              Nact          j                =                                            2              ×                              act                j                                      +                          avg              ⁢              _              ⁢              act                                            act            +                          2              ×                              avg                ⁢                _                ⁢                act                                                                        (        25        )            where Avg_act is an average value of actj in an immediately preceding coded picture. A quantization scale code mquantj where visual characteristics are taken into consideration is given by an equation (26) on the basis of the value of the quantization scale code Qj obtained at the step S2.mquantj=Qj×Nactj  (26)
With the quantization scale code mquantj thus obtained, the compressed image information compressed by the MPEG2 method is converted into compressed image information with a lower target bit rate.
However, the method described above requires the calculation of an average pixel value for each macroblock in the equations (22) to (24) every time the image conversion processing is performed, thus requiring an enormous amount of processing for the calculation. As a result, the processing takes time, and the cost of the apparatus is increased because hardware capable of the enormous calculations is required.
In addition, while the activity described above is calculated using the pixel values of a luminescence signal of an original image, it is not possible to know the pixel values of the luminescence signal of the original image in the image conversion processing. Therefore, when the input compressed image information has been subjected to efficient adaptive quantization adapted to the complexity of the image by detection of skin color or detection of red, for example, adaptive quantization using similar normalized activity information cannot be performed at the time of requantization.